Authors Angry as Tech Companies Use Their Books to Train AI without Permission
Many authors are expressing frustration and concern after discovering that tech companies have been using their books to train artificial intelligence (AI) without their knowledge or consent. This practice has raised issues of copyright infringement and potential loss of income for these authors.
The training system in question is called Books3, and it relies on a collection of pirated e-books obtained from various genres, including erotic fiction and prose poetry, for its dataset. According to an investigation by The Atlantic, the use of books enables generative AI systems to improve their ability to communicate information.
In an article, The Atlantic highlighted the concern that “The future promised by AI is written with stolen words.” While AI models can learn from existing text found on the internet, higher-quality input, such as that found in books, is necessary for generating top-notch AI content.
However, many authors are not viewing the use of their books as an honor. Instead, they argue that it is a shortcut that deprives them of due credit and compensation for their work. Romantic novelist Nora Roberts, who has 206 books in the Books3 database, expressed her dissatisfaction, stating, “We are human beings, we are writers, and we are being exploited by people who want to use our work, again without permission or compensation, to ‘write’ books, scripts, essays because it’s cheap and easy.”
Several authors, including Sarah Silverman, Richard Kadrey, and Christopher Golden, have filed a lawsuit against Meta, the owner of Facebook, alleging copyright infringement. They argue that Meta violated their rights by using their books to train their language model LLaMA, a competitive algorithm to OpenAI’s GPT-4.
Alex Reisner from The Atlantic further fueled the controversy by publishing a searchable database that allows anyone to see if their favorite authors’ works are being used to teach AI communication skills. Prominent authors like Stephen King, John Kratz, and James Patterson are among those included in the list, which features books obtained through web-crawling technology that discovered bootleg PDF copies of these works online.
In response to the situation, The Authors Guild published a guide for authors who discover their books are part of the Books3 dataset, providing suggestions on copyright protection and future implications related to AI. Additionally, the guild and 17 authors filed a class-action suit against OpenAI in New York for copyright infringement. Authors such as David Baldacci, Mary Bly, Michael Connelly, John Grisham, Jodi Picoult, Scott Turow, and Rachel Vail are among the plaintiffs in the case.
The impact of this controversy extends beyond just the authors. The complaint filed against OpenAI emphasizes how the plaintiffs’ books were utilized to power ChatGPT and thousands of other applications and enterprise uses, potentially earning OpenAI significant profits.
Notably, Meta employed a “takedown” order against a developer who used leaked LLaMA coding, claiming that no one is authorized to exhibit, reproduce, transmit, or distribute Meta Properties without explicit written permission from Meta. Despite making LLaMA open-source, Meta still requires developers to obtain a license for its use.
While some authors are angered by the unauthorized use of their work, others, like Ian Bogost, author of “Play Anything: The Pleasure of Limits, the Uses of Boredom and the Secret of Games,” welcome the opportunity. In a column for The Atlantic, Bogost wrote, “My Books Were Used to Train Meta’s Generative AI. Good.” He believes that successful art goes beyond the creator’s intentions and that attempting to restrict the use of his writing in one avenue would undermine its potential benefits in other unexpected ways.
Opinion: Balancing Rights and Opportunities in AI Training
The use of authors’ books to train AI models has ignited a contentious debate over rights and opportunities. On one side, authors argue for the protection of their intellectual property and fair compensation for their work. They express frustration at being exploited by tech companies seeking to utilize their creations without permission. These concerns are undoubtedly legitimate, and creators should have agency over how their works are utilized.
However, it is also crucial to consider the potential benefits that AI training can bring to the literary landscape. AI-generated content has the capacity to unearth new perspectives, create innovative narratives, and even empower aspiring writers by offering them an AI-driven platform for content creation. By embracing the unexpected uses of their work, authors may discover unexplored avenues for their storytelling and artistic expression.
Nonetheless, a balance must be struck. Tech companies should prioritize obtaining proper authorization and compensating authors for their contributions to AI training. Respect for intellectual property rights is essential for maintaining trust and fostering healthy collaborations between authors and AI developers. Clear guidelines and transparent procedures can facilitate consent-based partnerships, ensuring that authors’ work is utilized ethically and with mutual benefits.
In the rapidly evolving landscape of AI technology, it is crucial to navigate the intersection of creativity, copyright, and AI training responsibly. By cultivating an environment where authors’ rights are safeguarded while also embracing the potential of AI, we can foster a future where AI and human creativity coexist harmoniously.
Editor Notes: The Importance of Ethical AI Training
The recent revelations about tech companies using authors’ books to train AI without permission highlight the need for ethical practices in AI development. Respect for intellectual property and fair compensation are fundamental principles that must be upheld in the pursuit of AI advancement.
At GPT News Room, we are committed to promoting responsible AI training and fostering a culture of transparency, consent, and collaboration. Our aim is to empower both authors and AI developers to navigate this landscape while upholding the rights and creative integrity of all parties involved.
To stay updated on the latest developments in AI and its impact on various industries, visit our GPT News Room. We believe in harnessing the potential of AI while also ensuring the protection of creative minds.