Google says Gemini AI is making its robots smarter


Google is training its robots with Gemini AI so they can get better at navigation and completing tasks. The DeepMind robotics team explained in a new research paper how using Gemini 1.5 Pro’s long context window — which dictates how much information an AI model can process — allows users to more easily interact with its RT-2 robots using natural language instructions.
This works by filming a video tour of a designated area, such as a home or office space, with researchers using Gemini 1.5 Pro to make the robot “watch” the video to learn about the environment. The robot can then undertake commands based on what it has observed using verbal and / or image outputs — such as guiding users to a power outlet after being shown a phone and asked “where can I charge this?” DeepMind says its Gemini-powered robot had a 90 percent success rate across over 50 user instructions that were given in a 9,000-plus-square-foot operating area.
Researchers also found “preliminary evidence” that Gemini 1.5 Pro enabled its droids to plan how to fulfill instructions beyond just navigation. For example, when a user with lots of Coke cans on their desk asks the droid if their favorite drink is available, the team said Gemini “knows that the robot should navigate to the fridge, inspect if there are Cokes, and then return to the user to report the result.” DeepMind says it plans to investigate these results further.
The video demonstrations provided by Google are impressive, though the obvious cuts after the droid acknowledges each request hide that it takes between 10–30 seconds to process these instructions, according to the research paper. It may take some time before we’re sharing our homes with more advanced environment-mapping robots, but at least these ones might be able to find our missing keys or wallets.
Google is training its robots with Gemini AI so they can get better at navigation and completing tasks. The DeepMind robotics team explained in a new research paper how using Gemini 1.5 Pro’s long context window — which dictates how much information an AI model can process — allows users…
Recent Posts
- Top digital loan firm security slip-up puts data of 36 million users at risk
- Nvidia admits some early RTX 5080 cards are missing ROPs, too
- I tried ChatGPT’s Dall-E 3 image generator and these 5 tips will help you get the most from your AI creations
- Gabby Petito murder documentary sparks viewer backlash after it uses fake AI voiceover
- The quirky Alarmo clock is no longer exclusive to Nintendo’s online store
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010