em360tech image

In a move straight out of science fiction, Google have announced that two new AI models will be used to power physical robots.

The LLM is created to help robots perform “a wider range of real-world tasks than ever before.”

The tech giant is partnering with Apptronik in order to build humanoid robots powered by Gemini 2.0.

Gemini Robotics and Gemini Robotics-ER are two models set to power a new generation of intelligent robots.

What is Gemini Robotics?

Gemini Robotics is Google's new advanced vision-language-action model that is capable of understanding new situations that it hasn't even been trained on.

The model is built on Google's flagship multimodal model: Gemini 2.0. The addition of physical actions as an entirely new output modality is designed specifically for controlling robots.

Gemini Robotics takes a big step forward in its ability to generalize to novel situations as well as solve a wider variety of tasks. It can deal with new environments, diverse instructions and objects.

Read: Meet Figure 02: The AI-Powered Robot Workforce of the Future

Gemini Robotics is also adept at dealing with new objects, diverse instructions, and new environments.

The model performs more than twice as well on comprehensive generalization benchmarks compared to other leading vision-language-action models.

Gemini Robotics is also much more interactive than previous models. This helps the robots to interact with people and the environment, as well as quickly adapt to changes.

Read: Is Gemini Racist? Google’s AI Pulled Amidst Bias Allegations

Through advanced language understanding capabilities, Gemini Robotics is able to respond to commands given in conversational languages. By continuously monitoring its surroundings and detecting changes in the environment it can automatically adjust its actions.

Google also highlights the importance of prioritising dexterity. Current iterations are able to perform complex, multi-step tasks that require precise manipulation. This has been proven through the robots creating origami.

What is Gemini Robotics-ER?

Gemini Robots-ER is Google’s new model that has advanced spatial understanding. This is designed to enable roboticists to run their own programs using Gemini’s embodied reasoning (ER).

Gemini Robotics-ER is fundamentally about enhancing a robot's ability to "understand" its environment in three dimensions. Gemini Robotics-ER is able to accurately perceive the location, size, and orientation of objects in space, understand how objects relate to each other, such as "on top of," "inside," or "next to as well as, crucially, determine the optimal way to pick up an object, considering its shape, size, and orientation.