MIT’s newest computer vision algorithm identifies images down to the pixel

For humans, identifying items in a scene — whether that’s an avocado or an Aventador, a pile of mashed potatoes or an alien mothership — is as simple as looking at them. But for artificial intelligence and computer vision systems, developing a high-fidelity understanding of their surroundings takes a bit more effort. Well, a lot more effort. Around 800 hours of hand-labeling training images effort, if we’re being specific. To help machines better see the way people do, a team of researchers at MIT CSAIL in collaboration with Cornell University and Microsoft have developed STEGO, an algorithm able to identify images down to the individual pixel.
Normally, creating CV training data involves a human drawing boxes around specific objects within an image — say, a box around the dog sitting in a field of grass — and labeling those boxes with what’s inside (“dog”), so that the AI trained on it will be able to tell the dog from the grass. STEGO (Self-supervised Transformer with Energy-based Graph Optimization), conversely, uses a technique known as semantic segmentation, which applies a class label to each pixel in the image to give the AI a more accurate view of the world around it.
Whereas a labeled box would have the object plus other items in the surrounding pixels within the boxed-in boundary, semantic segmentation labels every pixel in the object, but only the pixels that comprise the object — you get just dog pixels, not dog pixels plus some grass too. It’s the machine learning equivalent of using the Smart Lasso in Photoshop versus the Rectangular Marquee tool.
The problem with this technique is one of scope. Conventional multi-shot supervised systems often demand thousands, if not hundreds of thousands, of labeled images with which to train the algorithm. Multiply that by the 65,536 individual pixels that make up even a single 256×256 image, all of which now need to be individually labeled as well, and the workload required quickly spirals into impossibility.
Instead, “STEGO looks for similar objects that appear throughout a dataset,” the CSAIL team wrote in a press release Thursday. “It then associates these similar objects together to construct a consistent view of the world across all of the images it learns from.”
“If you're looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don't know what the right objects should be,” MIT CSAIL PhD student, Microsoft Software Engineer, and the paper’s lead author Mark Hamilton said. “In these types of situations where you want to design a method to operate at the boundaries of science, you can't rely on humans to figure it out before machines do.”
Trained on a wide variety of image domains — from home interiors to high altitude aerial shots — STEGO doubled the performance of previous semantic segmentation schemes, closely aligning with the image appraisals of the human control. What’s more, “when applied to driverless car datasets, STEGO successfully segmented out roads, people, and street signs with much higher resolution and granularity than previous systems. On images from space, the system broke down every single square foot of the surface of the Earth into roads, vegetation, and buildings,” the MIT CSAIL team wrote.
“In making a general tool for understanding potentially complicated data sets, we hope that this type of an algorithm can automate the scientific process of object discovery from images,” Hamilton said. “There's a lot of different domains where human labeling would be prohibitively expensive, or humans simply don’t even know the specific structure, like in certain biological and astrophysical domains. We hope that future work enables application to a very broad scope of data sets. Since you don't need any human labels, we can now start to apply ML tools more broadly.”
Despite its superior performance to the systems that came before it, STEGO does have limitations. For example, it can identify both pasta and grits as “food-stuffs” but doesn't differentiate between them very well. It also gets confused by nonsensical images, such as a banana sitting on a phone receiver. Is this a food-stuff? Is this a pigeon? STEGO can’t tell. The team hopes to build a bit more flexibility into future iterations, allowing the system to identify objects under multiple classes.
For humans, identifying items in a scene — whether that’s an avocado or an Aventador, a pile of mashed potatoes or an alien mothership — is as simple as looking at them. But for artificial intelligence and computer vision systems, developing a high-fidelity understanding of their surroundings takes a bit…
Recent Posts
- The iOS 18.4 beta brings Matter robot vacuum support
- Philips Monitors is now offering a whopping 5-year warranty on some of its displays, including a gorgeous KVM-enabled business monitor
- The secretive X-37B space plane snapped this picture of Earth from orbit
- Beyond 100TB, here’s how Western Digital is betting on heat dot magnetic recording to reach the storage skies
- The end of an era? TSMC, Broadcom could tear apart Intel’s legendary business after 57 years by separating its foundry and chip design
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010