Jun 25, 2025 03:25 PM IST
Meta’s Llama 3.1 model can reproduce over 40% of Harry Potter and other popular books, researchers say.
Meta’s Llama 3.1 model is showing just how much ground AI has covered in recent years. Researchers from Stanford, Cornell, and West Virginia University found that this 70-billion parameter model can recall and reproduce over 42 percent of Harry Potter and the Philosopher’s Stone, line for line, when prompted with the right cues. The findings have set off fresh debate about what happens when AI models remember too much, especially when it comes to copyrighted work.
AI’s growing appetite for nooks
Llama 3.1 isn’t just picking up a few famous quotes. The model can reliably generate long stretches of text from some of the world’s most popular books, including The Hobbit and 1984. The researchers broke down 36 books into 100-token passages, then used the first half as a prompt to see if the AI could guess the rest. Llama 3.1 managed to match the original text more than half the time, far outpacing older models like Llama 1, which only managed around 4 percent on the same test.
The study also noticed that the more popular the book, the more likely the model was to reproduce it accurately. Lesser-known works hardly registered, but bestsellers were easy targets. This raises questions for writers and publishers about how exposed their work is when AI models are trained on massive datasets scraped from the web.
Legal and creative questions ahead
With AI companies like Meta already facing lawsuits over their training methods, these findings land at a sensitive moment. If a model can serve up large sections of a copyrighted book, it’s not just a technical achievement, it’s a legal and ethical dilemma. The research team points out that open-weight models like Llama 3.1 are easier to test for memorisation, since researchers can access the technical details needed to measure what the model remembers. This transparency could make open models more vulnerable to legal scrutiny than their closed-source rivals.
For authors, the study is a reminder that the biggest and most beloved books are also the most at risk. For the AI industry, it’s a sign that the old ways of collecting and using data are under the microscope, and that we may need a fresh outlook to make copyright laws work in sync with AI advancements, so that creative industries don’t suffer radical setbacks in the near-future. As the legal battles heat up, the spotlight will stay on how these powerful models handle the stories and ideas that shape our culture.