Google DeepMind unveiled Gemini Robotics 1.5 AI models to power general-purpose robots

On Thursday, Google DeepMind announced two new artificial intelligence (AI) models for its Gemini Robotics family. The Gemini Robotics-ER 1.5 and Gemini Robotics 1.5 models operate together to power general-purpose robots. These models outperform any embodied AI models developed by the Mountain View-based tech giant in terms of reasoning, perception, and action skills over a wide range of real-world settings. The ER 1.5 model is intended to be the planner or orchestrator, whilst the 1.5 model can accomplish tasks using plain language instructions.

Google DeepMind's Gemini AI models can function as the brain of a robot
DeepMind described in a blog post the two new Gemini Robotics models, which are intended for general-purpose robots functioning in the real environment. Generative AI technology has resulted in a significant advancement in robotics, replacing the old interface for communicating with robots with natural language commands.

However, there are still numerous hurdles to incorporating AI models as a robot's brain. For example, large language models struggle to comprehend spatial and temporal dimensions or perform precise movements for various object shapes. This issue arose because a single AI model was responsible for both planning and execution, making the process error-prone and sluggish.

Google's answer to this problem involves a two-model arrangement. The Gemini Robotics-ER 1.5, a vision-language model (VLM), features sophisticated reasoning and tool-calling capabilities. It may generate work plans that include several steps. According to the business, the model excels at making logical judgments in real situations and can natively use technologies like as Google Search to search for information. It is also touted to perform at state-of-the-art (SOTA) levels on a variety of spatial comprehension criteria.

After the plan is developed, the Gemini Robotics 1.5 goes into action. The vision-language-action (VLA) concept converts visual information and instructions into motor commands, allowing a robot to do tasks. The model thinks about and designs the most efficient method to complete an action before carrying it out. It can also describe its reasoning process in common language, which increases transparency.

Google promises that this method will enable robots to better grasp complicated and multi-step commands and execute them in a single flow. For example, if a user instructs a robot to sort various things into the appropriate compost, recycling, and garbage bins, the AI system may first look up local recycling standards on the Internet, analyze the materials in front of it, devise a strategy to sort them, then carry out the activity.

Notably, the tech behemoth claims that the AI models were created to function in robots of all shapes and sizes due to their superior spatial knowledge and flexibility. Currently, developers may access the orchestrator Gemini Robotics-ER 1.5 using Google AI Studio's Gemini application programming interface (API). In contrast, the VLA model is exclusively offered to a small group of partners.

Google DeepMind unveiled Gemini Robotics 1.5 AI models to power general-purpose robots

Post a Comment