Cornell researchers identify verbatim poems in AI models like ChatGPT

January 13, 2024

AI poems

A recent study by Cornell University has shed light on the capabilities of AI chatbots like ChatGPT in memorizing and reproducing poems, including those under copyright. 

The study raises ethical and copyright concerns about the data sources used for training AI, a red-hot topic in the industry right now due to the recent New York Times lawsuit and controversies at Midjourney

David Mimno, study author and associate professor of information science, explained why they chose poems: “They’re short enough to fit in the context size of a language model. Their status is complicated: many of the poems we studied are technically under copyright, but they’re also widely available from reputable sources like the Poetry Foundation.”

The study encompassed ChatGPT and other models like Google AI’s PaLM, EleutherAI’s Pythia, and OpenAI’s GPT-2. D’Souza compiled a selection of poems from 60 American poets of varied backgrounds and presented them to these models. 

Researchers used specific prompts to request poems from these models. These prompts varied, including asking for poems by their titles, authors, or even starting lines. This was important to test whether the models could accurately recall and reproduce the requested poem.

ChatGPT successfully retrieved 72 of the 240 poems, while PaLM managed 10, while GPT-2 and Pythia failed to fully recall poems. 

The primary determinant of a chatbot’s ability to memorize a poem was its inclusion in the poetry canon, with the poet’s race, gender, and era being less significant.

A poem being published in the Norton Anthology of Poetry, particularly the 1983 edition, was the most reliable indicator of it being memorized and returned verbatim.

Moreover, the researchers found that responses changed over time, with ChatGPT later handling copyrighted poems unpredictably, sometimes refusing requests for whole verbatim poems. 

Lyra D’Souza, author of the study, expressed concerns to the Cornell Chronicle about large language models (LLMs) memorizing extensive texts, highlighting privacy and copyright implications: “It’s generally not good for large language models to memorize large chunks of text, in part because it’s a privacy concern.”

This research, currently focused on American poetry, aims to expand to include responses to poetry in various languages and to assess how specific poetic features influence the likelihood of memorization.

Moreover, while the study identifies copyright poems in training data and clarifies the ability of models to recall them verbatim, it doesn’t shed light on where they’re sourced.

Popular poems are likely to appear in numerous locations on the web, e.g., web forums, blogs, etc., so, unsurprisingly, they’re well-recalled from datasets scraped from general web sources.

How the study worked

Here’s more information about how the study, The Chatbot and the Canon: Poetry Memorization in LLMs, presented at the Computational Humanities Research Conference, worked:

  1. Building a diverse poetry collection: The researchers compiled a dataset of 240 poems by 60 American poets, ensuring a wide range of time periods, ethnicity, gender, and fame. The study involved various language models, including ChatGPT, Google’s PaLM, Pythia from EleutherAI, and OpenAI’s GPT-2. 
  2. Designing prompts: Researchers used specific prompts to request poems from these models. These prompts varied, including asking for poems by their titles, authors, or even starting lines. 
  3. Evaluating model responses: The responses from the AI models were analyzed to determine whether they could accurately reproduce the requested poems. The key metric was the accuracy of the reproduction, which involved checking if the models could recall the exact text of the poems.
  4. Analyzing factors influencing memorization: The study also examined factors influencing a model’s ability to memorize poems. This included analyzing whether the presence of a poem or poet in well-known anthologies, like the Norton Anthology of Poetry, or the poet’s race, gender, and Wikipedia page length impacted the likelihood of a poem being memorized by the AI models.
  5. Conclusions and implications: The study concluded that larger models like ChatGPT and PaLM were more successful at memorizing and reproducing poems. It highlighted how AI models trained on web-scraped data might reinforce existing literary biases.

This study revealed not only the capabilities of AI in processing poetry but also highlighted the potential for existing literary biases to be mirrored and perpetuated by AI models. 

If humanity begins to rely on AI as somewhat of an encyclopedia, can we rely on it to represent works fairly? Due to inherent challenges for fair and diverse representation of topics within training data, probably not.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions