Waymo wants to use Google’s Gemini to train its robotaxis
Waymo has long touted its ties to Google’s DeepMind and its decades of AI research as a strategic advantage over its rivals in the autonomous driving space. Now, the Alphabet-owned company is taking it a step further by developing a new training model for its robotaxis built on Google’s multimodal large language model (MLLM) Gemini.
Waymo released a new research paper today that introduces an “End-to-End Multimodal Model for Autonomous Driving,” also known as EMMA. This new end-to-end training model processes sensor data to generate “future trajectories for autonomous vehicles,” helping Waymo’s driverless vehicles make decisions about where to go and how to avoid obstacles.
But more importantly, this is one of the first indications that the leader in autonomous driving has designs to use MLLMs in its operations. And it’s a sign that these LLMs could break free of their current use as chatbots, email organizers, and image generators and find application in an entirely new environment on the road. In its research paper, Waymo is proposing “to develop an autonomous driving system in which the MLLM is a first class citizen.”
End-to-End Multimodal Model for Autonomous Driving, also known as EMMA
The paper outlines how, historically, autonomous driving systems have developed specific “modules” for the various functions, including perception, mapping, prediction, and planning. This approach has proven useful for many years but has problems scaling “due to the accumulated errors among modules and limited inter-module communication.” Moreover, these modules could struggle to respond to “novel environments” because, by nature, they are “pre-defined,” which can make it hard to adapt.
Waymo says that MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: the chat is a “generalist” trained on vast sets of scraped data from the internet “that provide rich ‘world knowledge’ beyond what is contained in common driving logs”; and they demonstrate “superior” reasoning capabilities through techniques like “chain-of-thought reasoning,” which mimics human reasoning by breaking down complex tasks into a series of logical steps.
Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its driverless cars find the right route, including encountering various animals or construction in the road.
Other companies, like Tesla, have spoken extensively about developing end-to-end models for their autonomous cars. Elon Musk claims that the latest version of its Full Self-Driving system (12.5.5) uses an “end-to-end neural nets” AI system that translates camera images into driving decisions.
This is a clear indication that Waymo, which has a lead on Tesla in deploying real driverless vehicles on the road, is also interested in pursuing an end-to-end system. The company said that its EMMA model excelled at trajectory prediction, object detection, and road graph understanding.
“This suggests a promising avenue of future research, where even more core autonomous driving tasks could be combined in a similar, scaled-up setup,” the company said in a blog post today.
But EMMA also has its limitations, and Waymo acknowledges that there will need to be future research before the model is put into practice. For example, EMMA couldn’t incorporate 3D sensor inputs from lidar or radar, which Waymo said was “computationally expensive.” And it could only process a small amount of image frames at a time.
There are also risks to using MLLMs to train robotaxis that go unmentioned in the research paper. Chatbots like Gemini often hallucinate or fail at simple tasks like reading clocks or counting objects. Waymo has very little margin for error when its autonomous vehicles are traveling 40mph down a busy road. More research will be needed before these models can be deployed at scale — and Waymo is clear about that.
“We hope that our results will inspire further research to mitigate these issues,” the company’s research team writes, “and to further evolve the state of the art in autonomous driving model architectures.”
Waymo has long touted its ties to Google’s DeepMind and its decades of AI research as a strategic advantage over its rivals in the autonomous driving space. Now, the Alphabet-owned company is taking it a step further by developing a new training model for its robotaxis built on Google’s multimodal…
Recent Posts
- Amazon’s new Proteus warehouse robot is fully autonomous
- Let us filter AI slop, you cowards
- AI leaders call for tougher protections against AI-aided bioweapons
- 5 Best Smart Speakers (2026): Alexa, Google Assistant, Siri
- I’m an outdoors expert — here are 9 easy-pitch tents I’d recommend for a fuss-free camping trip
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023