The Hamburg Regional Court’s decision in Robert Kneschke v LAION e.V. has implications for the use of copyrighted material for developing AI training data. The Court ruled in favor of LAION e.V. accused of using copyrighted images in their multi-modal dataset without obtaining requisite consent. LAION e.V. was protected under the “scientific research” exception covered under Article 3 and the use of texts and data mining for “scientific research” as enshrined under EU Directive on Digital Single Market. The article explores the functioning of multi-modal AI models and the German Copyright Act’s treatment of texts and data mining.
The Hamburg Regional Court examined the legality of using copyright-protected works to create datasets for AI training and found the LAION 5B dataset to be “scientific research”. LAION 5B is a multi-modal dataset that democratizes research around large-scale multi-modal model training. Multi-modal Models are AI systems that process information from multiple sources such as text, audio, video, and image data for healthcare, education, science, and technology research. The petitioner’s copyrighted image, when part of an open dataset such as LAION 5B, is open to use by millions of AI researchers, which deprives the author of compensation for their work.
While opt-out formalities strengthen rights holders’ positions by empowering them to negotiate licensing deals with AI and technology companies while obtaining necessary remuneration for their creative works, there is limited clarity on whether creators have to provide opt-outs for all entities that train AI models. For now, reference to the American “fair use” approach and steering away from an “opt-out” mechanism may be ideal given the interests of greater access to knowledge, democratized research and creativity.
The EU AI Act and emerging global practices place the onus of reserving their rights on copyright holders, making it necessary for them to “opt-out” of allowing their works to be open to use by other creators, developers, and publishers. The plaintiff Robert Kneschke had asserted his opt-out right—the website hosting his photographic works prohibits “downloading, scraping, or caching” of any content. However, the Court placed the defendant’s right of reproduction under §60d on a higher pedestal than the right under §44b since the former does not permit an opt-out.
Datasets that train AI will inevitably continue to download and scrape images they do not own, albeit for the larger goal of democratising and enabling research and innovation. In India, where the Copyright Act, 1957 does not mention TDM as an exception, any policy must steer away from an “opt-out” mechanism. As India treads the path towards becoming a global AI hub, the growth of the AI-based creative industry must be safeguarded, permitting young innovators and developers to grow without the fear of licensing formalities and infringement fines.
The Hamburg Court found the reproduction as “correlation” between uploaded images and stored LAION 5B dataset as processing efficient information gathering under the TDM exception. The Court, while examining the difference in the application of law towards datasets indexing images and scraping images, ruled that the use of photographs was “scientific research”.
The Court referred to Article 5(5) of the InfoSoc Directive to enable exceptions to copyright. Article 4(3) of the Directive allows for limitations on an author's copyright for the purposes of TDM kicking in if the use of their works has not been “reserved” by the right holders in an appropriate manner.
The Court recognized that the future possibility of training AI systems based on datasets such as LAION 5B was not an impairment of the petitioner's right and was purely non-commercial. LAION 5B furthered “scientific research” since the creation of such datasets served as a basis for the training of AI systems and therefore for knowledge acquisition in the future.
The Hamburg Court's decision to allow the use of copyrighted material for developing AI training data according to the scientific research exception and its interpretation as an exception to allegations of copyright infringement merits serious consideration by IP and AI enthusiasts alike.
Given the variety of objectives these multi-modal models serve, it is evident that humongous amounts of data drive them, creating a set of circumstances that need further exploration to bring clarity. The Hamburg Court's judgment is the first Europan case to explore the legality of using copyrighted-protected works to create datasets for AI training. It is expected that clarity shall emerge on the interpretation of “scientific research” under the DSM directive and the difference in the application of the law towards datasets that index images and those that actively scrape them (see, Getty Images v Stability AI).