Report from TechCrunch
In Brief – Encyclopedia Britannica, which owns Merriam-Webster and retains the copyright on over 100,000 online articles, has sued OpenAI for widespread copyright infringement. The publisher claims that OpenAI unlawfully scraped and used its content to train large language models, and that the AI developer violates copyright law when its AI systems generate responses containing verbatim or closely paraphrased excerpts of Britannica content. The complaint, which frames AI-generated answers as direct market substitutes for Britannica’s offerings that divert traffic and revenue away from the publisher, argues that OpenAI’s use of retrieval-augmented generation (RAG), which incorporates Britannica material into real-time responses, further violates copyright law. Finally, Britannica alleges violations of federal trademark law, asserting that ChatGPT sometimes produces made up “hallucinated” information and falsely attributes it to the publisher, potentially damaging its reputation.
Context – The application of copyright law to AI training is a foundational legal and regulatory issue facing the industry. While the issue is being debated globally, lawsuits filed in US courts by a wide variety of IP rights holders soon after the Generative AI chatbots and image creators emerged are likely to be most impactful, with the key question being the application of the “fair use” doctrine to training under US copyright law. Two conflicting court opinions released last summer illustrate the complexities. As existing cases plod forward, new suits keep coming, including BMG filing one against Anthropic. At the same time, AI chatbots and internet search are functionally merging, and so publisher copyright complaints are also targeting how AI chatbots scrape websites to make their answers more timely. That’s not model training. While the largest AI developers continue to reject arguments that they must pay for content used in basic training, the largest, including Google, Meta, and OpenAI are making deals with select publishers to include content in chatbot answers. Google’s dominance of traditional internet search brings related competition law challenges as they merge AI answers into their search results.
