Gemini just got physical and you should prepare for a robot revolution

- Gemini Robotics is a new model
- It focuses on the physical world and will be used by robots
- It’s visual, interactive, and general
Google Gemini is good at many things that happen inside a screen, including generative text and images. Still, the latest model, Google Robotics, is a vision language action model that moves the generative AI into the physical world and could substantially speed up the humanoid robot revolution race.
Gemini Robotics, which Google’s DeepMind unveiled on Wednesday, improves Gemini’s abilities in three key areas:
- Dexterity
- Interactivity
- Generalization
Each of these three aspects significantly impacts the success of robotics in the workplace and unknown environments.
You may like
Generalization allows a robot to take Gemini’s vast knowledge about the world and things, apply it to new situations, and accomplish tasks on which it’s never been trained. In one video, researchers show a pair of robot arms controlled by Gemini Robotics, a table-top basketball game, and ask it to “slam dunk the basketball.”
Even though the robot hadn’t seen the game before, it picked up the small orange ball and stuffed it through the plastic net.
Google Gemini Robotics also makes robots more interactive and able to respond not only to changing verbal assignments but also to unpredictable conditions.
In another video, researchers asked the robot to put grapes in a bowl with bananas, but then they moved the bowl around while the robot arm adjusted and still managed to put the grapes in a bowl.
Sign up for breaking news, reviews, opinion, top tech deals, and more.

Google also demonstrated the robot’s dextrous capabilities, which let it tackle things like playing tic-tac-toe on a wooden board, erasing a whiteboard, and folding paper into origami.
Instead of hours of training on each task, the robots respond to near-constant natural language instructions and perform the tasks without guidance. It’s impressive to watch.
Naturally, adding AI to robotics is not new.
Last year, OpenAI partnered up with Figure AI to develop a humanoid robot that can work out tasks based on verbal instructions. As with Gemini Robotics, Figure 01’s visual language model works with the OpenAI speech model to engage in back-and-forth conversations about tasks and changing priorities.
In the demo, the humanoid robot stands before dishes and a drainer. It’s asked about what it sees, which it lists, but then the interlocutor changes tasks and asks for something to eat. Without missing a beat, the robot picks up an Apple and hands it to him.
While most of what Google showed in the videos was disembodied robot arms and hands working through a wide range of physical tasks, there are grander plans. Google is partnering with Apptroniks to add the new model to its Apollo humanoid Robot.
Google will connect the dots with additional programming, a new advanced visual language model called Gemini Robotics-ER (embodied reasoning).
Gemini Robotics-ER will enhance robotics spatial reasoning and should help robot developers connect the models to existing controllers.
Again, this should improve on-the-fly reasoning and make it possible for the robots to quickly figure out how to grasp and use unfamiliar objects. Google calls Gemini Rotbotics ER an end-to-end solution and claims it “can perform all the steps necessary to control a robot right out of the box, including perception, state estimation, spatial understanding, planning and code generation.”
Google is providing Gemini robotics -ER model to several business- and research-focused robotics firms, including Boston Dynamics (makers of Atlas), Agile Robots, and Agility Robots.
All-in-all, it’s a potential boon for humanoid robotics developers. However, since most of these robots are designed for factories or still in the laboratory, it may be some time before you have a Gemini-enhanced robot in your home.
You might also like
Gemini Robotics is a new model It focuses on the physical world and will be used by robots It’s visual, interactive, and general Google Gemini is good at many things that happen inside a screen, including generative text and images. Still, the latest model, Google Robotics, is a vision language…
Recent Posts
- Gemini just got physical and you should prepare for a robot revolution
- Spectre Divide and its developer are shutting down
- Meta is trying to block ex-employee’s book alleging misconduct and harassment
- I compared Apple’s Mac Studio M3 Ultra with 10 Windows workstations and I am truly shocked by what I found
- Meta is trying to ‘offload’ kids safety onto app stores with new bills, Google says
Archives
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010