Waymo wants to use Google’s Gemini to train its robotaxis


Waymo has long touted its ties to Google’s DeepMind and its decades of AI research as a strategic advantage over its rivals in the autonomous driving space. Now, the Alphabet-owned company is taking it a step further by developing a new training model for its robotaxis built on Google’s multimodal large language model (MLLM) Gemini.
Waymo released a new research paper today that introduces an “End-to-End Multimodal Model for Autonomous Driving,” also known as EMMA. This new end-to-end training model processes sensor data to generate “future trajectories for autonomous vehicles,” helping Waymo’s driverless vehicles make decisions about where to go and how to avoid obstacles.
But more importantly, this is one of the first indications that the leader in autonomous driving has designs to use MLLMs in its operations. And it’s a sign that these LLMs could break free of their current use as chatbots, email organizers, and image generators and find application in an entirely new environment on the road. In its research paper, Waymo is proposing “to develop an autonomous driving system in which the MLLM is a first class citizen.”
End-to-End Multimodal Model for Autonomous Driving, also known as EMMA
The paper outlines how, historically, autonomous driving systems have developed specific “modules” for the various functions, including perception, mapping, prediction, and planning. This approach has proven useful for many years but has problems scaling “due to the accumulated errors among modules and limited inter-module communication.” Moreover, these modules could struggle to respond to “novel environments” because, by nature, they are “pre-defined,” which can make it hard to adapt.
Waymo says that MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: the chat is a “generalist” trained on vast sets of scraped data from the internet “that provide rich ‘world knowledge’ beyond what is contained in common driving logs”; and they demonstrate “superior” reasoning capabilities through techniques like “chain-of-thought reasoning,” which mimics human reasoning by breaking down complex tasks into a series of logical steps.
Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its driverless cars find the right route, including encountering various animals or construction in the road.
Other companies, like Tesla, have spoken extensively about developing end-to-end models for their autonomous cars. Elon Musk claims that the latest version of its Full Self-Driving system (12.5.5) uses an “end-to-end neural nets” AI system that translates camera images into driving decisions.
This is a clear indication that Waymo, which has a lead on Tesla in deploying real driverless vehicles on the road, is also interested in pursuing an end-to-end system. The company said that its EMMA model excelled at trajectory prediction, object detection, and road graph understanding.
“This suggests a promising avenue of future research, where even more core autonomous driving tasks could be combined in a similar, scaled-up setup,” the company said in a blog post today.
But EMMA also has its limitations, and Waymo acknowledges that there will need to be future research before the model is put into practice. For example, EMMA couldn’t incorporate 3D sensor inputs from lidar or radar, which Waymo said was “computationally expensive.” And it could only process a small amount of image frames at a time.
There are also risks to using MLLMs to train robotaxis that go unmentioned in the research paper. Chatbots like Gemini often hallucinate or fail at simple tasks like reading clocks or counting objects. Waymo has very little margin for error when its autonomous vehicles are traveling 40mph down a busy road. More research will be needed before these models can be deployed at scale — and Waymo is clear about that.
“We hope that our results will inspire further research to mitigate these issues,” the company’s research team writes, “and to further evolve the state of the art in autonomous driving model architectures.”
Waymo has long touted its ties to Google’s DeepMind and its decades of AI research as a strategic advantage over its rivals in the autonomous driving space. Now, the Alphabet-owned company is taking it a step further by developing a new training model for its robotaxis built on Google’s multimodal…
Recent Posts
- Mint and pink: a closer look at the backflipping Framework Laptop 12
- Amazon’s goal is to put an Echo screen in everyone’s house
- Up close with Alexa Plus – this may finally be the Echo upgrade I’ve been waiting for
- The Xbox Wireless Controller is just $39 right now
- Living with extreme heat might make you age faster
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010