Meta to Train AI on EU Public Content After Regulatory Scrutiny

Meta to Start Training AI Models on EU Public Content

Meta is moving forward with its plans to train its AI models using public content from Facebook and Instagram users in the European Union. This decision follows a period of regulatory scrutiny and adjustments to address data privacy concerns. Previously, Meta had paused these plans in response to pressure, but now, the company says it will begin this week. This move comes after launching a limited version of Meta AI in the EU last month, significantly later than its U.S. debut.

For years, Meta has been training its AI on user-generated content in the U.S. However, the stricter privacy laws in the EU, particularly the General Data Protection Regulation (GDPR), posed a challenge. The GDPR requires a solid legal basis for processing personal data to train AI models.

Meta initially paused its plans in June 2024 due to pushback from the Irish Data Protection Commission (DPC), which regulates Meta in the EU on behalf of several data protection authorities. In September 2024, Meta restarted its efforts to train AI systems using public posts from its U.K. user base.

Meta stated in a blog post: “Last year, we delayed training our large language models using public content while regulators clarified legal requirements. We welcome the opinion provided by the EDPB in December, which affirmed that our original approach met our legal obligations. Since then, we have engaged constructively with the IDPC and look forward to continuing to bring the full benefits of generative AI to people in Europe.”

Starting this week, EU users will receive in-app and email notifications explaining that Meta will use public data and interactions with Meta AI to train its models. These notifications will include a link to an opt-out form, allowing users to prevent their data from being used. Meta has committed to honoring all previously and newly submitted objection forms. It's important to note that Meta will not use private messages or data from users under 18 in the EU for training purposes.

Meta emphasizes the importance of training AI models on diverse data to understand the nuances of European communities, including dialects, colloquialisms, hyper-local knowledge, and humor. Meta also mentioned that they are following the example set by companies like Google and OpenAI, both of which have already used data from European users to train their AI models.

The DPC continues to scrutinize how Large Language Model creators train their AI services, having recently announced an investigation into xAI’s training of Grok.