Google is training robots the way it trains AI chatbots
RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request.
Researchers tested RT-2 with a robotic arm in a kitchen office setting, asking its robotic arm to decide what makes a good improvised hammer (it was a rock) and to choose a drink to give an exhausted person (a Red Bull). They also told the robot to move a Coke can to a picture of Taylor Swift. The robot is a Swiftie, and that is good news for humanity.
The new model trained on web and robotics data, leveraging research advances in large language models like Google’s own Bard and combining it with robotic data (like which joints to move), the company said in a paper. It also understands directions in languages other than English.
For years, researchers have tried to imbue robots with better inference to troubleshoot how to exist in a real-life environment. The Verge’s James Vincent pointed out real life is uncompromisingly messy. Robots need more instruction just to do something simple for humans. For example, cleaning up a spilled drink. Humans instinctively know what to do: pick up the glass, get something to sop up the mess, throw that out, and be careful next time.
Previously, teaching a robot took a long time. Researchers had to individually program directions. But with the power of VLA models like RT-2, robots can access a larger set of information to infer what to do next.
Google’s first foray into smarter robots started last year when it announced it would use its LLM PaLM in robotics, creating the awkwardly named PaLM-SayCan system to integrate LLM with physical robotics.
Google’s new robot isn’t perfect. The New York Times got to see a live demo of the robot and reported it incorrectly identified soda flavors and misidentified fruit as the color white.
Depending on the type of person you are, this news is either welcome or reminds you of the scary robot dogs from Black Mirror (influenced by Boston Dynamics robots). Either way, we should expect an even smarter robot next year. It might even clean up a spill with minimal instructions.
RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request. Researchers tested RT-2 with a robotic arm in a kitchen office setting, asking…
Recent Posts
- I’m an outdoors expert — here are 9 easy-pitch tents I’d recommend for a fuss-free camping trip
- Samsung’s updated Health app unsurprisingly comes with new AI-powered features
- Amazon develops a warehouse robot workers can speak to
- This App Makes Google TV Actually Usable
- Google Wallet ID passes will be available in select EU states this summer
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023