Google DeepMind To Power Physical Robots With New Gemini Robotics Models

In a move straight out of science fiction, Google have announced that two new AI models will be used to power physical robots.

The LLM is created to help robots perform “a wider range of real-world tasks than ever before.”

The tech giant is partnering with Apptronik in order to build humanoid robots powered by Gemini 2.0.

Gemini Robotics and Gemini Robotics-ER are two models set to power a new generation of intelligent robots.

What is Gemini Robotics?

Gemini Robotics is Google's new advanced vision-language-action model that is capable of understanding new situations that it hasn't even been trained on.

The model is built on Google's flagship multimodal model: Gemini 2.0. The addition of physical actions as an entirely new output modality is designed specifically for controlling robots.

Gemini Robotics takes a big step forward in its ability to generalize to novel situations as well as solve a wider variety of tasks. It can deal with new environments, diverse instructions and objects.

Read: Meet Figure 02: The AI-Powered Robot Workforce of the Future

Gemini Robotics is also adept at dealing with new objects, diverse instructions, and new environments.

The model performs more than twice as well on comprehensive generalization benchmarks compared to other leading vision-language-action models.

Gemini Robotics is also much more interactive than previous models. This helps the robots to interact with people and the environment, as well as quickly adapt to changes.

Read: Is Gemini Racist? Google’s AI Pulled Amidst Bias Allegations

Through advanced language understanding capabilities, Gemini Robotics is able to respond to commands given in conversational languages. By continuously monitoring its surroundings and detecting changes in the environment it can automatically adjust its actions.

Google also highlights the importance of prioritising dexterity. Current iterations are able to perform complex, multi-step tasks that require precise manipulation. This has been proven through the robots creating origami.

What is Gemini Robotics-ER?

Gemini Robots-ER is Google’s new model that has advanced spatial understanding. This is designed to enable roboticists to run their own programs using Gemini’s embodied reasoning (ER).

Gemini Robotics-ER is fundamentally about enhancing a robot's ability to "understand" its environment in three dimensions. Gemini Robotics-ER is able to accurately perceive the location, size, and orientation of objects in space, understand how objects relate to each other, such as "on top of," "inside," or "next to as well as, crucially, determine the optimal way to pick up an object, considering its shape, size, and orientation.

What is Gemini Robotics On-Device?

Gemini Robotics On-Device is Google’s new AI model that is designed to run directly on robotic devices, without needing a constant internet connection or relying on cloud computing.

Gemini Robotics on Device is a VLA model, which stands for Vision-Language-Action. This allows for the three key capabilities needed for an autonomous robot.

Vision refers to the robots ability to “see” and interpret its surroundings by using cameras and sensors. VLA models can process visual input to identify objects, understand their properties, and perceive the environment.

Language refers to the robots ability to understand and respond to natural language commands from humans. This is the same process that powers large language models like Open AI’s ChaptGPT and Google’s Gemini.

Action refers to the robots ability to translate visual and language inputs into physical actions. This can mean controlling moving parts like arms or grippers in order to interact in the real world.

On-device processing

The standout feature of Gemini Robotics On-Device is in the name- unlike typical AI models that send data to cloud servers for processing, Gemini Robotics On-Device handles everything locally on the robot itself.

On-device processing allows for the model to operate at low latency. Latency refers to the lag time between an action, such as a command or a sensor being set off, and the system responding. In the case of robotics it's the time from the robot ‘seeing’ something, processing the information and executing a physical action. By processing on-device, these network delays are eliminated, allowing the robot to react in near real-time.

It also facilitates offline operations, meaning the robot can function without a Wi-Fi connection. This can be valuable in areas with inconsistent connectivity including disaster zones as well as secure facilities where data cannot leave the premises due to security protocols.

On-device processing also helps to assuage privacy concerns. When data is processed on-device it doesn't need to be transmitted to and stored on remote cloud servers. This reduces the attack surface for data breaches. It also minimizes the risk of sensitive information being intercepted or accessed by third parties. This is especially important for robots in personal environments like homes, healthcare settings that deal with patient data, or organisations with private information.

Improved dexterity and task generalization

Dexterity in robotics refers to the ability to manipulate objects with skill, precision, and adaptability. General purpose means it's not limited to one specific task but can handle a wider spectrum of activities.

Gemini Robotics On-Device is designed to go beyond repetitive industrial tasks and tackle more nuanced actions that require fine motor control, object recognition, and adaptive gripping.

Low-shot learning

Low-shot learning means an AI model is able to learn new skills from a small amount of training examples.

Instead of needing extensive data for every variation of a task, a robot can quickly learn new skills, making it much more flexible and adaptable for rapidly changing needs.

Designed for bi-arm robots

Bi-arm robots are, as the name suggests, robots with two arms. This allows for more complex tasks that require holding an object with one arm while manipulating it with the other, or performing tasks that require two points of contact.

Gemini Robotics On-Device represents a significant step towards creating truly autonomous and versatile robots. By prioritising on-device functionality, Google ensures the next phase of robotics is not only more practical but also more responsive, secure, and adaptable for real-world applications.