Tumblr and WordPress posts will reportedly be used for OpenAI and Midjourney training

Tumblr and WordPress are reportedly set to strike deals to sell user data to artificial intelligence companies OpenAI and Midjourney. 404 Media reports that the platforms’ parent company, Automattic, is nearing completion of an agreement to provide data to help train the AI companies’ models.
It isn’t clear which data will be included, but the report suggests Automattic may have overreached initially. An alleged internal post from Tumblr product manager Cyle Gage suggests Automattic prepared to send private or partner-related data that wasn’t supposed to be included in the deal. The questionable content reportedly included private posts on public blog posts, deleted or suspended blogs, unanswered (therefore, not publicly posted) questions, private answers, posts marked explicit and content from premium partner blogs (like Apple’s former music site).
The internal post suggests Automattic’s engineers are preparing a list of post IDs that should have been excluded. It isn’t clear whether the data had already been sent to the AI companies.
Engadget emailed Automattic to ask for comment on the report. The company replied with a published statement, claiming, “We will share only public content that’s hosted on WordPress.com and Tumblr from sites that haven’t opted out.” The statement notes that legal regulations don’t currently require AI companies’ web crawlers to abide by users’ opt-out preferences.
The final line of Automattic’s statement appears to align with the reported deals. “We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control,” Automattic wrote. “Our partnerships will respect all opt-out settings. We also plan to take that a step further and regularly update any partners about people who newly opt out and ask that their content be removed from past sources and future training.”
The company reportedly plans to launch a new opt-out tool on Wednesday that claims to allow users to block third parties — including AI companies — from training on their data. 404 Media reviewed an alleged internal FAQ Automattic prepared for the tool, which includes the answer, “If you opt out from the start, we will block crawlers from accessing your content by adding your site on a disallowed list. If you change your mind later, we also plan to update any partners about people who newly opt-out and ask that their content be removed from past sources and future training.”
The phrasing, describing it as “asking” the AI companies to remove the data, may be relevant.
An alleged internal document from Automattic’s AI head, Andrew Spittle, replying to a staff question about data-removal assurances when using the tool, explains, “We will notify existing partners on a regular basis about anyone who’s opted out since the last time we provided a list. I want this to be an ongoing process where we regularly advocate for past content to be excluded based on current preferences. We will ask that content be deleted and removed from any future training runs. I believe partners will honor this based on our conversations with them to this point. I don’t think they gain much overall by retaining it.”
So, if a Tumblr or WordPress user requests to opt out of AI training, Automattic will allegedly “ask” and “advocate for” their removal. And the company’s AI boss “believes” the AI companies will find it in their best interest to comply “based on our conversations.” (How’s that for reassurance!)
AI data training deals have become a lucrative opportunity for websites treading water in today’s slippery online publishing landscape. (Tumblr’s staff was reportedly reduced to a skeleton crew in late 2023.) Last week, Google struck a deal with Reddit (ahead of the latter’s IPO) to train on the platform’s vast knowledge base of user-created content. Meanwhile, OpenAI rolled out a partnership program last year to collect datasets from third parties to help train its AI models.
Update, February 27, 2024, 3:56 PM ET: This story has been updated to add a published statement from WordPress and Tumblr parent company Automattic.
This article originally appeared on Engadget at https://www.engadget.com/tumblr-and-wordpress-posts-will-reportedly-be-used-for-openai-and-midjourney-training-204425798.html?src=rss
Tumblr and WordPress are reportedly set to strike deals to sell user data to artificial intelligence companies OpenAI and Midjourney. 404 Media reports that the platforms’ parent company, Automattic, is nearing completion of an agreement to provide data to help train the AI companies’ models. It isn’t clear which data…
Recent Posts
- One of the best AI video generators is now on the iPhone – here’s what you need to know about Pika’s new app
- Apple’s C1 chip could be a big deal for iPhones – here’s why
- Rabbit shows off the AI agent it should have launched with
- Instagram wants you to do more with DMs than just slide into someone else’s
- Nvidia is launching ‘priority access’ to help fans buy RTX 5080 and 5090 FE GPUs
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010