Reddit is currently under the lens of the Federal Trade Commission (FTC) for its AI data-licensing practices, which were revealed ahead of a planned IPO.
The FTC’s inquiry focuses on Reddit’s “sale, licensing, or sharing of user-generated content with third parties to train AI models.”
It comes as Reddit is preparing to go public, with plans to price its shares between $31 to $34, potentially valuing the company at approximately $6.5 billion.
Reddit is sitting on one of the biggest gold mines in internet content history. Its intention to sell posts and comments has caused an eruptive debate among its 850 million average monthly users.
One Reddit post is headed “Since Reddit is selling user data officially now, are your stories safe?” with responders agreeing to “start dumping useless garbage data into Reddit every day for the next sixty days.”
That’s an interesting point – Reddit’s data is highly sensitive to user inputs, and with such strong communities in place, the company shouldn’t be too complacent about its entitlement to user-generated content.
Nevertheless, Reddit argues that selling data remains harmonious with its principles, stating, “The opportunity does not conflict with our values and the rights of our Redditors.”
Reddit’s financial outlook appears robust, with a 20% increase in revenue last year, amounting to $804 million, largely driven by advertising.
Thus far, Reddit’s disclosure includes entering into data licensing agreements valued at $203 million. It expects to generate at least $66.4 million from these arrangements in 2024. It’s a modest part of its total income stream but could grow exponentially.
Reddit has already struck a partnership with Google aimed at training AI models, among other objectives. This highlights the importance of its data in a world where tech companies are increasingly willing to pay for their data rather than just scrape dubious ‘public use’ sources.
Reflecting on the FTC’s comments, Reddit stated, “We are not surprised that the FTC has expressed interest” in its data licensing practices, attributing the scrutiny to “the novel nature of these technologies and commercial arrangements.”
Furthermore, Reddit asserts its belief in the legality of its practices, emphasizing, “We do not believe that we have engaged in any unfair or deceptive trade practice.”
The company also shared insights into the ongoing dialogue with the FTC, noting, “The letter indicated that the FTC staff was interested in meeting with us to learn more about our plans and that the FTC intended to request information and documents from us as its inquiry continues.”
The FTC has been taking a harder line on tech deals in recent times, with the agency’s authorization of new investigatory powers over AI companies last November.
The new paid data goldrush
Data has come cheaply to generative AI companies, with databases created by web entities like Common Crawl and LAION forming the mainstay of training data.
However, that is changing, with copyright lawsuits racking up and the EU AI Act attempting to mandate tighter data practices for the industry.
Moreover, many websites are actively blocking AI web crawlers. The Wild West era of free training data might be ending.
Reddit isn’t the only company that knows the value of its content. Automattic, the parent company of WordPress and Tumblr, is reportedly in talks with MidJourney and OpenAI for a content and data deal.
As Reddit prepares for its IPO, the company’s trajectory will be closely watched by both regulators and Redditors.