Tesla backs vision-only approach to autonomy using powerful supercomputer


Tesla CEO Elon Musk has been teasing a neural network training computer called ‘Dojo’ since at least 2019. Musk says Dojo will be able to process vast amounts of video data to achieve vision-only autonomous driving. While Dojo itself is still in development, Tesla today revealed a new supercomputer that will serve as a development prototype version of what Dojo will ultimately offer.
At the 2021 Conference on Computer Vision and Pattern Recognition on Monday, Tesla’s head of AI, Andrej Karpathy, revealed the company’s new supercomputer that allows the automaker to ditch radar and lidar sensors on self-driving cars in favor of high-quality optical cameras. During his workshop on autonomous driving, Karpathy explained that to get a computer to respond to new environment in a way that a human can requires an immense dataset, and a massively powerful supercomputer to train the company’s neural net-based autonomous driving technology using that data set. Hence the development of these predecessors to Dojo.
Tesla’s newest-generation supercomputer has 10 petabytes of “hot tier” NVME storage and runs at 1.6 terrabytes per second, according to Karpathy. With 1.8 EFLOPS, he said it might be the fifth most powerful supercomputer in the world, but he conceded later that his team has not yet run the specific benchmark necessary to enter the TOP500 Supercomputing rankings.
“That said, if you take the total number of FLOPS it would indeed place somewhere around the fifth spot,” Karpathy told TechCrunch. “The fifth spot is currently occupied by NVIDIA with their Selene cluster, which has a very comparable architecture and similar number of GPUs (4480 vs ours 5760, so a bit less).”
Musk has been advocating for a vision-only approach to autonomy for some time, in large part because cameras are faster than radar or lidar. As of May, Tesla Model Y and Model 3 vehicles in North America are being built without radar, relying on cameras and machine learning to support its advanced driver assistance system and autopilot.
When radar and vision disagree, which one do you believe? Vision has much more precision, so better to double down on vision than do sensor fusion.
— Elon Musk (@elonmusk) April 10, 2021
Many autonomous driving companies use lidar and high definition maps, which means they require incredibly detailed maps of the places where they’re operating, including all road lanes and how they connect, traffic lights and more.
“The approach we take is vision-based, primarily using neural networks that can in principle function anywhere on earth,” said Karpathy in his workshop.
Replacing a “meat computer,” or rather, a human, with a silicon computer results in lower latencies (better reaction time), 360 degree situational awareness and a fully attentive driver that never checks their Instagram, said Karpathy.
Karpathy shared some scenarios of how Tesla’s supercomputer employs computer vision to correct bad driver behavior, including an emergency braking scenario in which the computer’s object detection kicks in to save a pedestrian from being hit, and traffic control warning that can identify a yellow light in the distance and send an alert to a driver that hasn’t yet started to slow down.
Tesla vehicles have also already proven a feature called pedal misapplication mitigation, in which the car identifies pedestrians in its path, or even a lack of a driving path, and responds to the driver accidentally stepping on the gas instead of braking, potentially saving pedestrians in front of the vehicle or preventing the driver from accelerating into a river.
Tesla’s supercomputer collects video from eight cameras that surround the vehicle at 36 frames per second, which provides insane amounts of information about the environment surrounding the car, Karpathy explained.
While the vision-only approach is more scalable than collecting, building and maintaining high definition maps everywhere in the world, it’s also much more of a challenge, because the neural networks doing the object detection and handling the driving have to be able to collect and process vast quantities of data at speeds that match the depth and velocity recognition capabilities of a human.
Karpathy says after years of research, he believes it can be done by treating the challenge as a supervised learning problem. Engineers testing the tech found they could drive around sparsely populated areas with zero interventions, said Karpathy, but “definitely struggle a lot more in very adversarial environments like San Francisco.” For the system to truly work well and mitigate the need for things like high-definition maps and additional sensors, it’ll have to get much better at dealing with densely populated areas.
One of the Tesla AI team game changers has been auto-labeling, through which it can automatically label things like roadway hazards and other objects from millions of videos capture by vehicles on Tesla camera. Large AI datasets have often required a lot of manual labelling, which is time-consuming, especially when trying to arrive at the kind of cleanly-labelled data set required to make a supervised learning system on a neural network work well.
With this latest supercomputer, Tesla has accumulated 1 million videos of around 10 seconds each and labeled 6 billion objects with depth, velocity and acceleration. All of this takes up a whopping 1.5 petabytes of storage. That seems like a massive amount, but it’ll take a lot more before the company can achieve the kind of reliability it requires out of an automated driving system that relies on vision systems alone, hence the need to continue developing ever more powerful supercomputers in Tesla’s pursuit of more advanced AI.
Tesla CEO Elon Musk has been teasing a neural network training computer called ‘Dojo’ since at least 2019. Musk says Dojo will be able to process vast amounts of video data to achieve vision-only autonomous driving. While Dojo itself is still in development, Tesla today revealed a new supercomputer that…
Recent Posts
- The hidden costs of data subject access requests (DSARs) on privacy
- Amazon Alexa event live – latest news and rumors ahead of devices and service announcements
- Everything new on Disney+ in March 2025: Marvel’s Daredevil: Born Again, Moana 2, Sadie Sink’s O’Dessa movie, and more
- The best Apple Watch in 2025
- Volvo ES90 will charge faster, drive farther than other Volvo EVs
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010