Researchers at The University of Texas at Austin have developed an innovative framework for training AI models on heavily corrupted images.
Known as Ambient Diffusion, this method enables AI models to ‘draw inspiration’ from images without directly copying them.
Conventional text-to-image models used by DALL-E, Midjourney, and Stable Diffusion risk copyright infringement because they’re trained on datasets that include copyrighted images, leading them to sometimes inadvertently replicate those images.
Ambient Diffusion flips that on its head by training models with deliberately corrupted data.
In the study, the research team, including Alex Dimakis and Giannis Daras from the Electrical and Computer Engineering department at UT Austin and Constantinos Daskalakis from MIT, trained a Stable Diffusion XL model on a dataset of 3,000 celebrity images.
Initially, the models trained on clean data were blatantly observed to copy the training examples.
However, when the training data was corrupted – randomly masking up to 90% of the pixels – the model still produced high-quality, unique images.
This means the AI is never exposed to recognizable versions of the original images, preventing it from copying them.
“Our framework allows for controlling the trade-off between memorization and performance,” explained Giannis Daras, a computer science graduate student who led the work.
“As the level of corruption encountered during training increases, the memorization of the training set decreases.”
Scientific and medical applications
The uses of Ambient Diffusion extend beyond resolving copyright issues.
According to Professor Adam Klivans, a collaborator on the project, “The framework could prove useful for scientific and medical applications too. That would be true for basically any research where it is expensive or impossible to have a full set of uncorrupted data, from black hole imaging to certain types of MRI scans.”
This is particularly beneficial in fields with limited access to uncorrupted data, such as astronomy and particle physics.
In these fields and others, data can be extremely noisy, poor-quality, or sparse, meaning meaningful data is heavily outnumbered by useless data. Teaching models to use sub-optimal data more efficiently would be helpful here.
If the Ambient Diffusion approach were further refined, AI companies could create functional text-to-image models while respecting the rights of original content creators and preventing legal issues.
While that wouldn’t solve concerns that AI image tools reduce the pool of work for real artists, it would at least protect their works from being accidentally replicated in outputs.