em360tech image

US Comedian and author Sarah Silverman, along with novelists Christopher Golden and Richard Kadrey, are suing OpenAI over claims of Copyright infringement. 

In a series of class action lawsuits filed Friday, Silverman, Golden and Kadrey claim OpenAI used their copyright-protected content to train the AI language model GPT, which powers a range of chatbots including ChatGPT

The trio allege that when prompted, ChatGPT will generate a summary of their work based on their own writing. This, they claim, infringes copyright laws as they did not give their consent to their books being fed to the generative AI chatbot. 

“OpenAI made copies of Plaintiffs' books during the training process of the OpenAI Language Models without Plaintiffs' permission,” the lawsuit reads. 

“Specifically, OpenAI copied at least Plaintiff Tremblay's book The Cabin at the End of the World; and Plaintiff Awad's books 13 Ways of Looking at a Girl and Bunny.” 

Silverman and the authors also claim that OpenAI’s ChatGPT breaches the Digital Millennium Copyright Act (DMCA) for regurgitating their content without the legally-required copyright management information found in original books. 

"At no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works,” Silverman et al state in a second suit.

Double whammy 

It’s not just OpenAI facing the authors’ legal wrath. In a separate lawsuit against Meta, Silverman, Golden and Kadrey allege their books were accessible in datasets Meta used to train a series of open-source AI Models the tech titan introduced in February.

The authors point to a paper Meta released earlier this year, which details the sources the tech titan used to train its AI models – one of which being ThePile. 

Lawsuit against OpenAI detailing how it scraped Shadow libraries to train ChatGPT. 
Sarah Silverman Lawsuit OpenAI

ThePile, the suit notes, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” are “flagrantly illegal,” the suit states.

In both complaints against Meta and OpenAI, the authors say they “did not consent to the use of their copyrighted books as training material” for the companies’ AI systems. 

They ask that both companies pay statutory damages and restitution of profits, and have requested a permanent injunction to stop them from continuing their actions. 

A threat to copyright

Silverman et al’s lawsuit is just the latest legal action targetting AI companies like OpenAI. At the end of June, a US law firm hit OpenAI with a $3 billion lawsuit for violating privacy laws by scraping data from the web to train ChatGPT. 

Meanwhile, in January, Getty Images sued the AI Art generator Stability AI for allegedly taking millions of copyright-protected images from the site to train its AI image generator Stable Diffusion.

Experts warn that the method by which AI firms obtain their data may lead to the copyright-protected work of millions of content creators being stolen, raising questions about the future of creative industries and the ability to tell fact from fiction.

While neither Meta nor OpenAI has revealed exactly which resources it has scraped from the web, both have admitted to using hundreds of thousands of copyrighted books stored on shadow library websites including those referenced in authors’ suits. 

OpenAI trains its Large Language Models (LLMs) by scraping publicly available text and images from the internet. These resources are not only limited to books – but also blogs, websites and even social media posts shared online. 

This method of taking content from the web currently sits in a legal grey area, with lawmakers struggling to decide if AI companies’ scraping of data breaks copyright laws. 

Since AI technologies are still in continuous development, It is yet to be seen whether governments will be able to legally prevent companies from taking work without their consent. 

The EU recently passed the world’s first AI Act, which serves to protect the mass harvesting of people’s private and sensitive data by AI companies. 

But it remains unclear if this legislation would enforce any sort of restrictions on the scraping of publicly-available online content by these companies.