When you post something on Instagram or Facebook, you probably think you’re just sharing it with your friends, family, and maybe a few others. But that’s not all. Everything you’ve ever posted is being used to train Meta’s powerful AI. Mark Zuckerberg bragged about his vast library of content, which includes all your posts, reels, and comments, during Meta’s earnings call Thursday. Your social media profiles are now one of the most valuable datasets on Earth, and Meta claims it owns them.
“On Facebook and Instagram there are hundreds of billions of publicly shared images and tens of billions of public videos,” said Meta’s CEO on its earnings call last week. “We estimate [this] is greater than the Common Crawl dataset, and people share large numbers of public text posts in comments across our services as well.”
This is Meta’s next big play. Instagram and Facebook have addicted users for the last 20 years, making sure to monetize us through advertisers every step of the way. Now, they’re revisiting your old posts, your special moments, and your big life updates, and using it to create billion-dollar AI tools. Zuckerberg’s braggadocious claim about Meta’s very large dataset comes shortly after The New York Times sued OpenAI over intellectual property. But Meta is pulling an old trick out of its playbook: extracting as much value out of Instagram and Facebook users as humanly possible, and totally owning your online self.
Should Instagram automatically own your data to build Meta’s AI? There’s yet to be a conversation about social media companies and their users over AI. However, Sarah Silverman and other book publishers are already suing Meta for ripping their ideas. Meta has profited off of its users’ data for years, but never to this extent. Elon Musk is doing the same thing with X, using all of Twitter to train xAI’s Grok.
Meta did not immediately respond to Gizmodo’s request for comment.
Meta revealed in September that the company was using public Facebook and Instagram posts to train its new AI assistant. For context, the Common Crawl dataset Zuckerberg references is over 250 billion web pages collected over 17 years. It’s one of the largest internet databases of human content, and it’s seen as a gold standard for training large language models. However, Meta’s data is better, it’s larger, and it’s more personal.
Zuckerberg essentially found a goldmine sitting on its shelf. Meta’s library of roughly two decades worth of Facebook and Instagram posts is now one of the most valuable assets the company has. Without any grand announcement or notice to users, Meta has essentially claimed ownership of your public social media profile and will use it to generate billions of dollars.
Meta’s large language model, Llama, is one of the best AI models in the world. The company is using it to train products like Meta AI, Imagine, and more. Meta hopes to infuse these AI products into Facebook, Instagram, and the Metaverse in the coming years.
Meta added $200 billion to its market cap in a single day last week, largely due to its AI efforts. Book publishers and news organizations understand how valuable this data is to AI, but social media users, once again, are being thrown to the curb. Instagram and Facebook’s library of content is now one of the greatest assets in the world, and Meta is once again cashing in on your social media profile.