Facebook is researching AI systems that see, hear, and remember everything you do

Facebook is pouring a lot of time and money into augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share imagery, but what does the company think such devices will be used for in the future?
A new research project led by Facebook’s AI team suggests the scope of the company’s ambitions. It imagines AI systems that are constantly analyzing peoples’ lives using first-person video; recording what they see, do, and hear in order to help them with everyday tasks. Facebook’s researchers have outlined a series of skills it wants these systems to develop, including “episodic memory” (answering questions like “where did I leave my keys?”) and “audio-visual diarization” (remembering who said what when).
Right now, the tasks outlined above cannot be achieved reliably by any AI system, and Facebook stresses that this is a research project rather than a commercial development. However, it’s clear that the company sees functionality like these as the future of AR computing. “Definitely, thinking about augmented reality and what we’d like to be able to do with it, there’s possibilities down the road that we’d be leveraging this kind of research,” Facebook AI research scientist Kristen Grauman told The Verge.
Such ambitions have huge privacy implications. Privacy experts are already worried about how Facebook’s AR glasses allow wearers to covertly record members of the public. Such concerns will only be exacerbated if future versions of the hardware not only record footage, but analyze and transcribe it, turning wearers into walking surveillance machines.

The name of Facebook’s research project is Ego4D, which refers to the analysis of first-person, or “egocentric,” video. It consists of two major components: an open dataset of egocentric video and a series of benchmarks that Facebook thinks AI systems should be able to tackle in the future.
The dataset is the biggest of its kind ever created, and Facebook partnered with 13 universities around the world to collect the data. In total, some 3,205 hours of footage were recorded by 855 participants living in nine different countries. The universities, rather than Facebook, were responsible for collecting the data. Participants, some of whom were paid, wore GoPro cameras and AR glasses to record video of unscripted activity. This ranges from construction work to baking to playing with pets and socializing with friends. All footage was de-identified by the universities, which included blurring the faces of bystanders and removing any personally identifiable information.
[embedded content]
Grauman says the dataset is the “first of its kind in both scale and diversity.” The nearest comparable project, she says, contains 100 hours of first-person footage shot entirely in kitchens. “We’ve open up the eyes of these AI systems to more than just kitchens in the UK and Sicily, but [to footage from] Saudi Arabia, Tokyo, Los Angeles, and Colombia.”
The second component of Ego4D is a series of benchmarks, or tasks, that Facebook wants researchers around the world to try and solve using AI systems trained on its dataset. The company describes these as:
Episodic memory: What happened when (e.g., “Where did I leave my keys?”)?
Forecasting: What am I likely to do next (e.g., “Wait, you’ve already added salt to this recipe”)?
Hand and object manipulation: What am I doing (e.g., “Teach me how to play the drums”)?
Audio-visual diarization: Who said what when (e.g., “What was the main topic during class?”)?
Social interaction: Who is interacting with whom (e.g., “Help me better hear the person talking to me at this noisy restaurant”)?
Right now, AI systems would find tackling any of these problems incredibly difficult, but creating datasets and benchmarks are tried-and-tested methods to spur development in the field of AI.
Indeed, the creation of one particular dataset and an associated annual competition, known as ImageNet, is often credited with kickstarting the recent AI boom. The ImagetNet datasets consists of pictures of a huge variety of objects which researchers trained AI systems to identify. In 2012, the winning entry in the competition used a particular method of deep learning to blast past rivals, inaugurating the current era of research.

Facebook is hoping its Ego4D project will have similar effects for the world of augmented reality. The company says systems trained on Ego4D might one day not only be used in wearable cameras but also home assistant robots, which also rely on first-person cameras to navigate the world around them.
“The project has the chance to really catalyze work in this field in a way that hasn’t really been possible yet,” says Grauman. “To move our field from the ability to analyze piles of photos and videos that were human-taken with a very special purpose, to this fluid, ongoing first-person visual stream that AR systems, robots, need to understand in the context of ongoing activity.”
Although the tasks that Facebook outlines certainly seem practical, the company’s interest in this area will worry many. Facebook’s record on privacy is abysmal, spanning data leaks and $5 billion fines from the FTC. It’s also been shown repeatedly that the company values growth and engagement above users’ well-being in many domains. With this in mind, it’s worrying that benchmarks in this Ego4D project do not include prominent privacy safeguards. For example, the “audio-visual diarization” task (transcribing what different people say) never mentions removing data about people who don’t want to be recorded.
When asked about these issues, a spokesperson for Facebook told The Verge that it expected that privacy safeguards would be introduced further down the line. “We expect that to the extent companies use this dataset and benchmark to develop commercial applications, they will develop safeguards for such applications,” said the spokesperson. “For example, before AR glasses can enhance someone’s voice, there could be a protocol in place that they follow to ask someone else’s glasses for permission, or they could limit the range of the device so it can only pick up sounds from the people with whom I am already having a conversation or who are in my immediate vicinity.”
For now, such safeguards are only hypothetical.
Facebook is pouring a lot of time and money into augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share imagery, but what does the company think such devices will be used for in the future? A new research project led…
Recent Posts
- The Humane Ai Pin Will Become E-Waste Next Week
- iPhone 16e benchmarks point to performance, RAM, and charging speed details
- ICYMI: the week’s 8 biggest tech stories, from the iPhone 16e to Wi-Fi 7 routers and a crackdown on Kindle piracy
- The Handmaid’s Tale season 6: everything we know so far about the hit Hulu show’s return
- Nvidia confirms ‘rare’ RTX 5090 and 5070 Ti manufacturing issue
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010