RESEARCH

We conduct research on the realization of intelligent systems and robotic platforms capable of operating in real-world environments.

Learning methods for physical AI

Imitation and reinforcement learning for dexterous manipulation

Imitation learning enables a robot to learn tasks simply from human demonstrations of the desired movements, without any special programming. For example, if a human shows how to grasp and lift a banana, the robot can learn to perform the same task. This makes it easier to introduce robots even without expert knowledge, and it is expected that they can be used starting from familiar, everyday tasks. Our team is working on advancing this type of imitation learning by using language models like GPT to understand the intent behind a task from only a few demonstrations and to generate reliable motion trajectories.

It is very difficult for humans to provide accurate demonstrations of fast and complex tasks such as transporting, rolling, or throwing objects on a tray. Therefore, we are also focusing on reinforcement learning, which allows robots to learn without relying on human demonstrations. Reinforcement learning enables robots to acquire behaviors through trial and error and is suitable for learning complex motions. By training in simulation under various randomized conditions, we can achieve zero-shot transfer in which the robot operates stably in the real world without additional fine‑tuning. In our research, we demonstrated non‑prehensile manipulation tasks such as (A) transportation, (B) reorientation, and (C) throwing, and we aim to extend this approach to cover a wider variety of tasks in the future.

Scaling up learning models / datasets

The advent of “foundation models” has brought a dramatic development in the field of AI that uses language and vision. The foundation models learn from massive data on the web and achieve stronger capabilities than traditional AI models. Recently, a similar trend has emerged in the field of robotics. Robotic foundation models, which learn from large-scale data and experience, are expected to perform tasks flexibly and dexterously, not limited to predefined ones.
In our team, members with diverse expertise advance research and development of the robotic foundation models, focusing on robot technologies that are truly useful in real industrial settings.

Human-centric learning

In the field, it is essential not only for robots to become smarter on their own, but also to learn and work in ways that adapt to humans. For example, in collaborative tasks, robots predict a person’s next intention from their movements, hand over objects while avoiding dangerous proximity, or support heavy items. We are conducting research on developing learning models that incorporate human-derived sensor information, such as skeletal data, as well as on explainability—how to convey the predicted movements of humans and robots.

Data collection techniques

Experience augmentation on simulation

Simulation has the advantage of allowing us to obtain information that cannot be acquired in the real world. It also enables the creation of diverse environmental scenes, including situations that do not exist in reality. We are studying learning methods for robot perception and object manipulation that make use of these characteristics of simulation.
One example is predicting the forces acting on objects in a scene from visual information. By using a model trained through virtual experiences in simulation, it becomes possible to estimate the forces acting on overlapping objects, which are difficult to measure in the real world. Force is an important factor that determines object motion, and being able to predict it opens up possibilities for various task applications.

Estimating Object Softness from Vision

When grasping an object, it is necessary to determine the grasping strategy by considering its physical properties. Softness, in particular, is important, as ignoring deformation may lead to damaging the object or failing to grasp it. To address this, we are developing a method that enables grasping while considering softness without touching the object. Since tactile experience is associated with visual perception, estimating softness from vision alone allows a robot hand without tactile sensors—and with limited control over grasping force—to appropriately grasp soft objects across various scenes.

High-fidelity digital twin environment from real data

Digital twins are technologies that recreate real-world environments—such as stores, homes, factories, and logistics facilities—as photorealistic and physically accurate 3D models. In recent years, it has become increasingly feasible to generate high‑precision 3D scans from videos captured with smartphones, without relying on specialized equipment.By analyzing recorded footage, we estimate the camera’s position, orientation, and the three‑dimensional structure of the scene simultaneously. Using this data together with the latest 3D Gaussian Splatting methods, we can produce smooth, high‑quality 3D visualizations from arbitrary viewpoints.
These digitally reconstructed environments and objects can be integrated into interactive robot simulations. By combining them with tools such as NVIDIA Omniverse and Isaac Sim, we perform research and development involving reinforcement learning, imitation learning, and control via ROS/MoveIt. Being able to conduct safe and efficient trial‑and‑error in an environment that closely resembles the real world offers significant benefits—not only for improving robotic capabilities but also for enhancing the quality of task design and validation in real‑world settings.

Grasping Interfaces with Embodiment

Evolution has consistently demonstrated optimal strategies for adaptation and functional efficiency in living organisms, particularly in executing critical operations necessary for survival by performing diverse tasks using a single end-effector. Consequently, bio-inspiration drawn from biological systems, including humans, remains a well-established and effective methodology for the design and control of robotic mechanisms. When addressing complex tasks such as assembly, disassembly, and object manipulation, it is essential to adopt biologically inspired approaches that support learning and adaptability based on natural analogues.

In this topic, we aim to develop bio-inspired end-effectors for human–robot systems to achieve high efficiency and adaptability across diverse operational scenarios. To this end, we leverage the imitation of both mechanical properties and sensing modalities of biological systems, enabling the reproduction of similar operations through subsequent learning processes.

Data Collection Interface for Imitation Learning

To build robot foundation models and imitation learning models, the quality of data is crucial, and the data collection method must be carefully designed. Teleoperation-based and leader–follower-type systems have been developed and used, but they face challenges in intuitiveness and scalability. To address this, we are developing a hand-shaped device that enables high-quality and easily scalable data collection using only open–close hand motions. In addition to developing the hand itself, it is necessary to build an interface that allows more intuitive operation with human hands. Furthermore, it is required to construct the system as a unified platform capable of posture tracking of the device and simultaneous collection of multimodal data.

Model evaluation methods

Benchmarks for Physical AI

In robotics research and development, it is important to address problems such as how well robots handle different objects and adapt to new situations. These problems need to be examined from different angles and require ongoing improvement. Recent work uses a wide range of data to move the field forward, bringing worldwide attention to the quality and quantity of the data. Our team has created and shared a large dataset for teaching robots to handle objects and for evaluating their performance across different types of robots, especially those with two arms. The dataset has over 10,000 examples and covers more than 100 tasks, from simple pick-and-place actions to more difficult assembly and teamwork with two hands. Using data and clear standards is key to building robots that work well anywhere.
By sharing this dataset, we want to help the robotics research community keep improving and create robots that can do more.