Report from Reuters
In Brief – Lawyers for the AI company Anthropic submitted a court filing that included a citation “hallucination” created by the company’s AI chatbot Clause. The filing was part of expert testimony from one of the company’s data scientists in a copyright lawsuit brought by a group of music publishers that claims Anthropic illegally trained its chatbot on copyrighted music lyrics without authorization or permission. A lawyer representing Universal Music Group told US Magistrate Judge Susan van Keulen that the data scientist’s filing cited a nonexistent academic article to bolster the company’s argument in a dispute over evidence. When the error was pointed out, lawyers for the AI company agreed that the filing included mistakes in the citation but said that the article cited in the filing was real, that the content of the article said what the filing said it said, and that the link to the paper was correct. However, the lawyers believe that when the Anthropic chatbot was asked to properly format all the citations for the filing, it apparently created a made-up name for that paper, and attributed new authors to it as well.
Context – Questions around the legality of “training” the neural networks of major generative AI models like Claude with non-licensed copyrighted material is the biggest legal issue surrounding AI. In the US, copyright lawsuits are taking center stage and judges will determine how to apply the fair use doctrine. It’s a complex legal question and both sides have strong arguments. In one of the big cases, involving image generating services, the judge has said he wants to determine how the GAI systems work to ascertain whether they store and retrieve some form of copies, or whether they create new things. To that key question, the fact that all generative AI systems sometimes create realistic-seeming fabrications is telling. The developers themselves are not sure how they work. And hallucinations, which limit the utility of the AI services in many business fields where accuracy really matters because all outputs need to be checked, are a clear case in point. If the developers knew why the systems sometimes simply make stuff up and pass it off like all the other outputs, they would solve the problem.
