Would you have allowed Meta to use your Facebook and Instagram posts to train their AI models? The reality is you didn’t have a choice.
Meta confirmed their use of Facebook and Instagram data to train its new AI assistant.
Back in August, a spokesperson said Llama 2 wasn’t trained on Meta’s data, stating the model “wasn’t trained on Meta user data, and we have not launched any Generative AI consumer features on our systems yet.”
This stance changed with the company’s new multi-platform generative AI assistant, which uses Meta’s data from people’s public posts and comments. However, Meta said they’ve consciously avoided using private posts shared among family and friends.
Nick Clegg, Meta’s President of Global Affairs, noted during the company’s annual Connect conference, “We’ve tried to exclude datasets that have a heavy preponderance of personal information,” He further added that the “vast majority” of the data used by Meta for training was publicly available.
Clegg gave an example, citing LinkedIn as a platform whose content Meta deliberately decided not to use due to privacy concerns.
Tech giants like Meta, OpenAI, and Google have recently been criticized for using internet data without consent to train their AI models.
These companies are now grappling with decisions about using private or copyrighted content within their AI systems and are confronting legal challenges from authors who accuse them of copyright breaches. Several high-profile authors, including Game of Thrones creator George R.R Martin, recently joined a deluge of lawsuits lodged against OpenAI and Meta.
At Meta’s Connect event, CEO Mark Zuckerberg unveiled Meta AI. This tool was highlighted as one of the company’s primary consumer-focused AI offerings. Unlike previous events, this year’s spotlight was majorly on AI, moving away from the largely defunct metaverse augmented and virtual reality project.
The foundation of Meta’s AI assistant lies in a customized model inspired by the Llama 2 language model, which they introduced for public, commercial use in July. Additionally, they developed Emu, an innovative model designed to create visuals based on textual inputs.
Clegg highlighted that this AI training process used text and images from public Facebook and Instagram posts.
A representative from Meta stated that the Emu model was specifically trained for image generation using these public posts.
The chat functionalities, on the other hand, were based on the Llama 2 model, which was supplemented with some publicly accessible and annotated datasets.
From a safety perspective, Clegg shared that specific measures were in place, such as prohibiting the AI tool from generating hyper-realistic images of public personalities. Addressing the contentious topic of copyrighted content, Clegg expressed, “We think it is, but I strongly suspect that’s going to play out in litigation.”
When queried about Meta’s precautions against replicating copyrighted images, a representative from the company pointed toward their updated terms of service, which strictly prohibits users from creating content that breaches privacy and intellectual property norms.