The Open Source Initiative (OSI) has released its official definition of “open” artificial intelligence, setting the stage for a clash with tech giants like Meta — whose models don’t fit the rules.
Open-source AI must reveal its training data, per new OSI definition


OSI has long set the industry standard for what constitutes open-source software, but AI systems include elements that aren’t covered by conventional licenses, like model training data. Now, for an AI system to be considered truly open source, it must provide:
- Access to details about the data used to train the AI so others can understand and re-create it
- The complete code used to build and run the AI
- The settings and weights from the training, which help the AI produce its results
This definition directly challenges Meta’s Llama, widely promoted as the largest open-source AI model. Llama is publicly available for download and use, but it has restrictions on commercial use (for applications with over 700 million users) and does not provide access to training data, causing it to fall short of OSI’s standards for unrestricted freedom to use, modify, and share.
Meta spokesperson Faith Eischen told The Verge that while “we agree with our partner OSI on many things,” the company disagrees with this definition. “There is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today’s rapidly advancing AI models.”
“We will continue working with OSI and other industry groups to make AI more accessible and free responsibly, regardless of technical definitions,” Eischen added.
For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them. The Linux Foundation has also made a recent attempt to define “open-source AI,” signaling a growing debate over how traditional open-source values will adapt to the AI era.
“Now that we have a robust definition in place maybe we can push back more aggressively against companies who are ‘open washing’ and declaring their work open source when it actually isn’t,” Simon Willison, an independent researcher and creator of the open-source multi-tool Datasette, told The Verge.
Hugging Face CEO Clément Delangue called OSI’s definition “a huge help in shaping the conversation around openness in AI, especially when it comes to the crucial role of training data.”
OSI’s executive director Stefano Maffulli says it took the initiative two years, consulting experts globally, to refine this definition through a collaborative process. This involved working with experts from academia on machine learning and natural language processing, philosophers, content creators from the Creative Commons world, and more.
While Meta cites safety concerns for restricting access to its training data, critics see a simpler motive: minimizing its legal liability and safeguarding its competitive advantage. Many AI models are almost certainly trained on copyrighted material; in April, The New York Times reported that Meta internally acknowledged there was copyrighted content in its training data “because we have no way of not collecting that.” There’s a litany of lawsuits against Meta, OpenAI, Perplexity, Anthropic, and others for alleged infringement. But with rare exceptions — like Stable Diffusion, which reveals its training data — plaintiffs must currently rely on circumstantial evidence to demonstrate that their work has been scraped.
Meanwhile, Maffulli sees open-source history repeating itself. “Meta is making the same arguments” as Microsoft did in the 1990s when it saw open source as a threat to its business model, Maffulli told The Verge. He recalls Meta telling him about its intensive investment in Llama, asking him “who do you think is going to be able to do the same thing?” Maffulli saw a familiar pattern: a tech giant using cost and complexity to justify keeping its technology locked away. “We come back to the early days,” he said.
“That’s their secret sauce,” Maffulli said of the training data. “It’s the valuable IP.”
The Open Source Initiative (OSI) has released its official definition of “open” artificial intelligence, setting the stage for a clash with tech giants like Meta — whose models don’t fit the rules. OSI has long set the industry standard for what constitutes open-source software, but AI systems include elements that…
Recent Posts
- Volvo ES90 will charge faster, drive farther than other Volvo EVs
- The truth about GenAI security: your business can’t afford to “wait and see”
- How Claude’s 3.7’s new ‘extended’ thinking compares to ChatGPT o1’s reasoning
- ‘We’re nowhere near done with Framework Laptop 16’ says Framework CEO
- Razer’s new Blade 18 offers Nvidia RTX 50-series GPUs and a dual mode display
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010