Hamburg Regional Court
File Number: 310 O 227/23
Announced on 27.09.2024
Clerk of the Court
Judgment
IN THE NAME OF THE PEOPLE
In the matter of
XXX, XXX
• Plaintiff -
Represented by: Attorneys SLD Intellectual Property Law Firm, Alstadt 260, 84028 Landshut, File No.: 23-00061
versus
XXX e.V., XXX
• Defendant -
Represented by:
Attorneys Heidrich Law Firm, Prinzenstraße 3, 30159 Hannover, File No.: 126/23-NA-UM
the Hamburg Regional Court - Civil Chamber 10 - through the Presiding Judge at the Regional Court Hartmann, the Judge Dr. Kanzler, and the Judge at the Regional Court Dr. Woodtli, on 27.09.2024, based on the oral hearing of 11.07.2024, declares as follows:
1. The complaint is dismissed.
2. The plaintiff has to bear the costs of the legal dispute.
3. The judgment is provisionally enforceable. The plaintiff may avert the defendant’s enforcement by providing a security deposit of 110% of the enforceable amount under this judgment, unless the defendant provides a security deposit of 110% of the enforceable amount prior to the enforcement.
Statement of Facts
The defendant is an association that was founded with the founding meeting of 07.07.2021 (minutes in Exhibit B7, statutes in Exhibit B1). The specific purpose of the defendant’s activities is disputed between the parties.
The defendant provides, under the designation “XXX,” a so-called dataset for image-text pairs publicly and free of charge. This is a kind of tabular document that contains hyperlinks to images or image files publicly accessible on the Internet as well as other information related to the respective images, including an image description (also called alternative text), which provides information about the content of the image in text form. The dataset includes 5.85 billion such image-text pairs. The dataset can be used for training so-called generative Artificial Intelligence. The creation of the dataset took place after the founding of the defendant in the second half of
2021. For this purpose, the defendant relied on an already existing dataset of XXX from the USA (www.XXX.org), which, for a kind of random cross-section of images found on the Internet, contained the respective URLs along with the textual description of the respective image content. The defendant then extracted the URLs of the images from this dataset and downloaded the images from their respective storage locations. The images were then checked by software at the defendant’s premises to determine whether the description of the image content in the preexisting dataset actually corresponded to the content visible in the image. Images in which text and image content did not sufficiently match were filtered out. For the remaining images, the metadata, especially the URL of the image’s storage location and the image description, were extracted and included in a newly created dataset, the “XXX” dataset. Whether the downloaded image files were subsequently deleted is disputed between the parties — at least concerning the image in dispute.
As part of the aforementioned process, the disputed image was captured, downloaded, analyzed, and incorporated into the XXX dataset. Specifically, an image file posted on the website of the image agency XXX (https://www.XXX.com) was downloaded, which was marked with a watermark from the photo agency XXX.
On the website of the image agency XXX, the following text was displayed on the subpage https://www.XXX.com/de/usage.html at least since 13.01.2021:
“RESTRICTIONS:
YOU MAY NOT:
(...)
18. Use automated programs, applets, bots or the like to access the XXX.com website or any content thereon for any purpose, including, by way of example only, downloading content, indexing, scraping, or caching any content on the website.” The plaintiff claims a violation of copyright rights concerning the disputed photograph in the form of unauthorized reproduction as part of the analysis process by the defendant.
The plaintiff claims that he is the author of the photograph mentioned in the ruling. The company XXX was authorized to offer the disputed photograph on its website XXX.com and to display it, as well as to offer licenses for the photograph; XXX, in this respect, held non-exclusive, sublicensable usage rights.
The reproduction, which — undisputedly — took place as part of the analysis process, infringes on his rights under § 16 of the German Copyright Act (UrhG), in particular, it is not covered by the limitations of §§ 44a, 44b, and 60d UrhG:
The limitation of § 44a UrhG does not apply; the independent download of a photograph does not constitute a temporary act within the meaning of this provision.
The reproduction is also not covered by § 44b UrhG. Consolidating data for AI training purposes is not text or data mining in the sense of § 44b UrhG. Neither the European nor the German legislator had such use “in mind” when creating the limitation provision of Art. 4 DSM-RL or § 44b UrhG. In text and data mining within the meaning of § 44b UrhG, only “information hidden in the data should be extracted,” “but not the content of the intellectual creation should be used.” In the present case of so-called “AI web scraping,” it is, however, precisely about the intellectual content of the works used for training purposes “and ultimately about the creation of identical or similar competing products.”
Furthermore, the dataset is, according to the defendant’s own disclaimer (printed on Bl. 47), "uncurated." Finally, the collection and storage for the creation of parallel archives are expressly excluded by the will of the legislator from the limitation provision of § 44b UrhG.
In addition, the "mass incorporation of copyrighted works for training purposes as part of generative AI" impairs the normal exploitation of copyrighted works because it creates the conditions to replace authors in many cases or at least significantly complicates the exploitation of the work through a free competing offer. This, according to Art. 7 para. 2 DSM-RL in conjunction with Art. 5 para. 5 InfoSoc-RL, stands in the way of applying the limitation provision.
In any case, the reproduction is unlawful due to the reservation of use declared on the website www.bigstockphoto.com according to § 44b para. 3 UrhG. The corresponding declaration of the image agency can be attributed to the plaintiff, as it distributes the disputed photograph for him. Contrary to the defendant’s opinion, the reservation is also machine-readable within the meaning of § 44b para. 3 sentence 2 UrhG. The requirements in this regard are no higher than for machine readability by humans; the reservation is, however, written in print. Moreover, the text is also recognizable as a reservation for a computer program. Thus, the service XXX could detect the corresponding reservation, and specific tools such as WebOpt-Out could detect reservations like the one on XXX.com.
Nor can the defendant invoke the limitation provision of § 60d UrhG. The plaintiff disputes that the defendant fulfills the requirements of § 60d UrhG in factual terms, namely:
• That it was “entered in the association register” at the time of the disputed reproduction act;
• That the document B1 presented by the defendant represents the valid articles of association of the defendant or represented them at the time of the disputed reproduction act;
• That the association members and the board members are working voluntarily or were doing so at the time of the disputed reproduction act;
• That the defendant is exclusively engaged in research activities or was so at the time of the disputed reproduction act or that the defendant pursues scientific research, noncommercial purposes, and reinvests all profits into scientific research or operates under a state-recognized mandate in the public interest;
• That the defendant creates and tests its own AI models based on the training data to further research the possibilities of AI technology;
• That the public provision of the training dataset should enable other researchers and interested parties to train their own AI models; according to the defendant’s own statements, the disputed dataset was also used to train the services "XXX," "XXX," and "XXX" from the provider XXX, but these are operated by (purely) commercial companies; as far as the defendant denies training the first two services, it would have been possible for them to use the dataset.
The defendant cannot claim the privilege of § 60d UrhG pursuant to § 60d para. 2 sentence 3 UrhG either.
The defendant is evidently collaborating closely with commercial AI providers:
• It appears that there is a collaboration with the private company XXX, which exerts direct influence on the defendant through financing the dataset in question and staffing relevant positions at the defendant with its own employees. According to an interview statement by its founder and managing director, XXX financed the XXX dataset.
• Members of the defendant’s “team” are also “widely” commercially active in the same field for large tech companies, including as employees of the company XXX.
• In a chat on the platform “XXX,” the co-founder of the defendant, XXX, urged the timely completion of the “XXX” on 28.09.2021, as a financing of $5,000 had been received from a “XXX” or his company; they should provide him with the data even if the dataset could not yet be made available to the public. The “XXX” in question is an employee of the commercial AI provider XXX.
Finally, the defendant cannot rely on a so-called “simple consent.” He – the plaintiff – did not make the disputed photograph freely accessible, but instead offered it through the agency XXX for the granting of paid licenses.
The plaintiff initially requested, besides an injunction against reproduction, information on the extent of the use of the photograph. The parties declared this request for information to be settled in the oral hearing on 11.07.2024.
The plaintiff now requests:
To order the defendant, under the threat of an administrative fine of up to 250,000 euros, alternatively, administrative detention of up to 6 months for each individual case of infringement, to refrain from reproducing or allowing the reproduction of the photograph depicted below for the creation of AI training datasets, as happened in the course of creating the XXX dataset.
[Image of the photograph follows.]
The defendant requests dismissal of the complaint.
The defendant denies that the plaintiff created the disputed image himself or is otherwise entitled to assert legal violations concerning the image in his own name, and that the plaintiff was entitled at the time the image was captured by the defendant to assert legal violations regarding the disputed image in his own name.
Most importantly, however, the single download of the disputed image in the context of creating the XXX dataset does indeed constitute a reproduction relevant under copyright law, but this is covered by the limitation provisions of §§ 44a, 44b, and 60d UrhG, as well as by simple consent of the plaintiff:
The reproduction in question is, on the one hand, covered by the limitation provision of § 44a UrhG. No permanent storage of the images occurred; rather, the images were only used temporarily for analysis and then immediately and irrevocably deleted in an automated manner. The reproduction, therefore, does not have an independent economic significance. Moreover, the limitation of § 44b UrhG applies. Analyzing image files and extracting metadata for training artificial intelligence is, according to the legislator’s intent, a primary application of text and data mining. No creation of digital parallel archives occurred, as the downloaded images were not permanently stored, but only hyperlinks were included. The exception under § 44b para. 3 UrhG does not apply:
• According to the plaintiff’s statements, it was not he himself as the rights holder, but a third party operating the website www.XXX.com who declared this reservation; the plaintiff himself explicitly stated in his email of 13.02.2023 (Exhibit B5) that he neither had the qualification nor the financial means to declare a usage reservation.
• Furthermore, the reservation was not made explicitly, as the passage on the website www.XXX.com was formulated in general terms and listed various prohibited actions. There was no explicit mention of text and data mining or reproductions.
• Additionally, the criterion of machine readability is not met. A clause formulated in natural language is generally not machine-readable within the meaning of § 44b para. 3 UrhG. For machine readability in this sense, the clause must be capable of being processed automatically by software. This requires coding of the corresponding information. At the very least, specific keywords such as “Data Mining” must be contained in the text.
Nor is the reservation evidently intended to be such under § 44b para. 3 UrhG. The fact that the clause, according to the plaintiff’s submission, was already present on the website as of 13.01.2021 shows that the clause could not have been created “with regard to the provision in § 44b para. 3 UrhG,” as the legal provision was not yet in force at that time. Moreover, it is “also not credible” that a U.S.-based provider would rely on a reservation arising from – German laws.
In any case, the defendant can rely on the limitation provision of § 60d UrhG:
• The defendant is a non-profit association composed of researchers and, according to the association’s statutes (Exhibit B1), is dedicated to research, in particular, to the advancement of self-learning algorithms in the sense of artificial intelligence and to making them accessible to the general public. It provides datasets and models free of charge, creates and tests its own AI models based on the training data.
• Its activities constitute “research.” Merely by making publicly transparent on the Internet how the training datasets are created, it contributes to the acquisition of knowledge about training artificial intelligence. Thus, other researchers can trace the steps of creating the datasets and build on them. Moreover, it first published a scientific paper on the disputed XXX dataset under the name “XXX” on 17.09.2022 (Exhibit B8). As of 05.04.2024, the paper in question had been cited a total of 1403 times in other scientific works and received further awards.
• Furthermore, the defendant also trains its own AI models based on the datasets it has created in order to gain insights into how AI can be improved through appropriate training.
• The natural persons who belong to the defendant, i.e., the board members and other members of the association, are also “researchers.” The fact that the plaintiff refers to individual team members being employed in the same field by large tech companies has no relevance to the question of whether the defendant itself is commercially active. Moreover, the individuals in question work for the defendant on a voluntary basis and must, therefore, earn their livelihood elsewhere.
• The fact that the datasets publicly made available by the defendant are also used by commercial providers is irrelevant to the applicability of the limitation provision of § 60d UrhG. Aside from that, the services XXX and XXX were actually not trained with the defendant’s dataset.
• The reverse exclusion clause stipulated in § 60d para. 2 sentence 3 UrhG does not apply in the present case. Although the company XXX did indeed provide the defendant with computing resources during its founding phase, the same was also done by XXX. defendant did not receive financial support from XXX. Nor was there any further collaboration with this company. In any case, XXX does not have privileged access to the research results. Nor is there any dominant influence by the company XXX. Neither the company XXX itself nor one of its legal representatives is a member of the defendant.
Decision Grounds
I.
The admissible complaint has no merit. The defendant did infringe the plaintiff’s exploitation rights by reproducing the disputed photograph. However, this infringement is covered by the limitation provision of § 60d UrhG. Whether the defendant can additionally rely on the limitation provision of § 44b UrhG does not need to be conclusively determined against this background.
The disputed photograph is protected as a photograph according to § 72 para. 1 UrhG. After inspecting the raw data stored on the plaintiff’s laptop, the court has no doubt about the plaintiff’s status as the photographer, as per § 72 para. 2 UrhG. The plaintiff is also entitled to assert infringement claims according to § 97 UrhG, including the claim for injunctive relief under paragraph 1 of this provision. The defendant has not provided evidence that the plaintiff granted the image agency XXX more than (sub-licensable) simple usage rights. The image agency XXX marked the photograph with a watermark, which constitutes an unauthorized alteration within the meaning of § 23 para. 1 sentence 1 UrhG, so that, in principle, the plaintiff’s consent as the author was required for its exploitation.
The defendant reproduced this version of the photograph within the scope of the download in question according to § 16 para. 1 UrhG without obtaining the plaintiff’s consent.
However, the defendant was authorized to do so by legal permission. Although the reproduction was not covered by the limitation provision of § 44a UrhG (see section 1 below), and whether the defendant can rely on the limitation provision of § 44b UrhG is doubtful (see section 2 below), the reproduction act was, in any case, covered by the limitation provision of § 60d UrhG (see section 3 below).
1.
The reproduction in question is not covered by the limitation provision of § 44a UrhG. According to this provision, temporary reproductions are permitted if they are fleeting or incidental and form an integral and essential part of a technical process, and their sole purpose is to enable a transmission in a network between third parties by an intermediary or a lawful use of a work or other protected subject matter, and the reproduction has no independent economic significance.
The reproduction in question was neither fleeting nor incidental.
a) Fleeting within the meaning of § 44a UrhG is a reproduction if its lifespan is limited to what is necessary for the proper functioning of the relevant technical process, and this process is so automated that it deletes the action automatically, i.e., without the involvement of a natural person, once its function to enable such a process has been fulfilled (ECJ, judgment of 16.07.2009, Case C-5/08 – Infopaq/Danske Dagblades Forening, para. 64 (juris) regarding Art. 5 para. 1 DSM-RL).
Insofar as the defendant claims that the files were “automatically” deleted as part of the analysis process carried out by him, this does not substantiate a fleeting reproduction within the aforementioned meaning. Apart from the fact that the defendant has not provided any details on the specific duration of storage, the deletion did not occur “independently of the user,” but rather due to a deliberate programming of the analysis process by the defendant.
b) Incidental within the meaning of § 44a UrhG is a reproduction if it is neither independent of nor serves an independent purpose in relation to the technical process of which it is part (ECJ, judgment of 05.06.2014, Case C-360/13, para. 43 (juris)).
In the present case, the image files were intentionally downloaded in order to analyze them using specific software. Thus, the download is not merely an incidental process to the analysis carried out, but a consciously and actively controlled procurement process that precedes the analysis.
2.
Whether the defendant can rely on the limitation provision of § 44b UrhG appears doubtful in the present case. Although the download performed by the defendant generally falls under the limitation provision of § 44b para. 2 UrhG, as it was carried out for the purpose of text and data mining within the meaning of § 44b para. 1 UrhG (see section a below), there are reasons to believe – without needing to make a conclusive decision – that the reproduction action was not permissible under § 44b para. 2 UrhG due to a validly declared usage reservation within the meaning of § 44b para. 3 (see section b below).
a)
The disputed reproduction action generally falls under the limitation provision of § 44b para. 2 UrhG.
(1) The disputed download was carried out for the purpose of text and data mining within the meaning of § 44b para. 1 UrhG. According to this provision, text and data mining is defined as the automated analysis of individual or multiple digital or digitized works to extract information, particularly about patterns, trends, and correlations. This is affirmed for the reproduction action in question (see subsection a); a teleological reduction of the limitation provision is not considered here (see subsection b).
The court does not need to decide the further question, which has been extensively discussed in the literature, whether the training of artificial intelligence as a whole falls under the limitation provision of § 44b UrhG (see BeckOK UrhR/Bomhard, 42nd Ed. 15.2.2024, UrhG § 44b paras. 11a-11b with references; also comprehensively discussed in the study "Copyright & Training of Generative AI Models – Technological and Legal Foundations" submitted as Exhibit K11, commissioned by the Initiative Urheberrecht).
(a) The defendant undertook the reproduction action for the purpose of extracting information about “correlations” within the literal meaning of § 44b para. 1 UrhG. The defendant downloaded the disputed photograph from its original storage location in order to compare the image content with the image description already stored in the text using an available software application — evidently the XXX application from XXX.
This analysis of the image file for comparison with a pre-existing image description undoubtedly constitutes an analysis for the purpose of extracting information about “correlations” (namely, the question of whether there is agreement or disagreement between images and image descriptions). The plaintiff did not dispute that the defendant analyzed the images included in the XXX dataset in this manner.
(b) The reproduction act in question is also not to be excluded from the scope of § 44b UrhG [German Copyright Act] by way of teleological reduction of the exception rule.
To the extent that the exclusion of the reproduction of data for the purpose of AI training by way of teleological reduction is occasionally supported in academic literature on the grounds that § 44b UrhG only covers the extraction of "information hidden in the data" but not the use of "the content of the intellectual creation" (Schack, NJW 2024, 113; similarly Dormis/Stober, Copyright Law and Training of Generative AI Models, Annex K11, p. 67 ff., with a differentiation between semantics and syntax), there are doubts as to whether this argument is convincing. This is because it does not sufficiently clarify what the difference is supposed to be, in the case of digitized works, between "information hidden in the data" and "the content of the intellectual creation."
To the extent that it is additionally argued that "AI web scraping" involves the intellectual content of the works used for training purposes and "ultimately" concerns the creation of identical or similar competing products (Schack, ibid.), this argument, in the Chamber’s view, does not sufficiently distinguish between:
• on the one hand, the (in this case, the only matter in dispute) creation of a dataset that can also be used for AI training, and
• on the other hand, the subsequent training of the artificial neural network with this dataset, and
• thirdly, the subsequent use of the trained AI for the purpose of generating new image content.
Although this latter functionality may already be intended during the creation of the training dataset, at the time of compiling the training dataset, it is neither foreseeable in what manner the second step (the training) will be successful, nor what specific content will be generated by the trained AI in the third step (during the AI application). Due to the rapidly evolving nature of technologies such as AI, the specific application possibilities are therefore not fully foreseeable at the time of creating the training dataset and thus cannot be legally determined with certainty. Because of this legal uncertainty, the initial general intention at the time of creating the training dataset—to eventually obtain AI-generated content—is not a suitable criterion for assessing the legality of creating the training dataset itself.
To the extent that a teleological reduction of the exception rule in § 44b UrhG is argued on the grounds that the European legislator in 2019, when establishing the underlying directive provision (Article 4 of the DSM Directive), "simply did not have the AI issue in mind" (Schack, ibid.; similarly Dormis/Stober for the training of AI models, ibid., pp. 71 ff., 87 ff.), this finding alone is evidently not sufficient for a teleological reduction. It must be taken into account, in particular, that the technological advancements in the field of so-called Artificial Intelligence since 2019 have affected less the nature and scope of the (disputed) data mining to obtain training data but rather the performance capabilities of the artificial neural networks trained with the data (accordingly, Dormis/Stober, ibid., p. 95, also assume that the mere creation of training datasets "prior to the actual training" would indeed fall under the TDM exception). Moreover, it should be noted that the Common Crawl Foundation’s database, accessed by the defendant, has already been created since 2008 (!), cf. Common Crawl Overview. Apart from that, the current European legislator has unequivocally expressed in the AI Regulation (Regulation (EU) 2024/1689 of June 13, 2024, Official Journal L of July 12, 2024, p. 1) that the creation of datasets intended for training artificial neural networks also falls under the exception rule of Article 4 of the DSM Directive. This is because, according to Article 53 (1) lit. c of the AI Regulation, providers of general-purpose AI models are required to implement a strategy, in particular, to identify and comply with a rights reservation asserted under Article 4 (3) of the DSM Directive.
Furthermore, the view that the creation of datasets intended for training artificial neural networks falls under the exception rule of Article 4 of the DSM Directive also corresponds to the assessment of the German legislator during the implementation of the aforementioned exception provision in 2021 (Explanation in the Government Draft, Bundestag Printed Matter 19/27426, p. 60).
(c) The so-called 3-step test enshrined in Article 5 (5) of the InfoSoc Directive (in conjunction with Article 7 (2) sentence 1 of the DSM Directive) also does not justify a different assessment. According to this, the standardized exceptions may only be applied in certain special cases where the normal exploitation of the work or other protected subject matter is not impaired and the legitimate interests of the rights holder are not unduly prejudiced. These requirements are met in the present case.
The reproduction relevant under copyright law in this case is limited to the purpose of analyzing the image files for their correspondence to a pre-existing image description, followed by incorporation into a dataset. There is no indication, nor has the plaintiff claimed, that this use would impair the potential exploitation of the respective works.
While the dataset created in this way may subsequently be used to train artificial neural networks, and the resulting AI-generated content may compete with works created by (human) authors, this alone does not justify viewing the mere creation of training datasets as an impairment of the exploitation rights of works within the meaning of Article 5 (5) of the InfoSoc Directive. This must apply, if only for the reason that considering merely future technological developments—which cannot yet be fully foreseen—does not allow for a legally certain distinction between permissible and impermissible uses (see similarly above (b)).
Since, based on current technological developments, it can never be ruled out with certainty that insights gained through text and data mining will be used to train artificial neural networks that may then compete with human authors, the opposing view would ultimately require a complete prohibition of text and data mining within the meaning of § 44b UrhG. However, such a complete nullification of the exception rule would obviously contradict the legislative intent and, therefore, cannot represent a viable interpretation.
(2) The image file downloaded by the defendant was also—something that the plaintiff, moreover, does not dispute—lawfully accessible within the meaning of § 44b (2) sentence 1 UrhG.
“Lawfully accessible” in this sense means, in particular, a work that is freely accessible on the internet (Explanation in the Government Draft, Bundestag Printed Matter 19/27426, p. 88). This applies to the image downloaded by the defendant. Contrary to the plaintiff’s initial assertions, the defendant did not download the “original image” as described in the initially formulated injunction request in the statement of claim—which was only made available by the photo agency XXX when [it?] would have been made available through the purchase of a license, but rather a version of the image marked with a watermark from the photo agency was downloaded. This was evidently the preview image uploaded on the agency’s website for promotional purposes. However, this preview image with the watermark was made freely accessible on the internet by the agency.
b)
However, there is much to suggest that in the present case, the exception rule under § 44b (2) UrhG does not apply—without the need for a final decision on this—since a validly declared reservation of use within the meaning of paragraph 3 of the provision existed. In particular, the reservation of use unambiguously declared on the website XXX.com likely meets the requirements for machine readability as stipulated in § 44b (3) sentence 2 UrhG.
(1) There is much to indicate that the reservation of use declared on the agency’s website was issued by an authorized person and that the plaintiff can invoke it to protect his own rights. According to the wording of § 44b (3) UrhG, “the rights holder” can declare the reservation of use.
Therefore, it is not only the declarations of the original copyright holder that should be considered, but also those of subsequent rights holders, whether they are legal successors or holders of derivative rights from the original author. According to the plaintiff’s plausible argument (record from 11.07.2024, p. 3, sheet 122 of the case file), he had granted the photo agency XXX simple, sublicensable usage rights to the original image. Thus, the photo agency itself became the rights holder of the images posted on its website and could therefore issue a reservation of use under § 44b (3) UrhG without further ado. There is no indication, nor has it been claimed, that any conflicting agreements affecting property rights existed between the plaintiff and the photo agency.
The plaintiff is likely entitled to invoke this reservation of use declared by his licensee. Economically speaking, the exploitation of the original photo in question took place through the agency. Thus, in practice, the specific decision as to which third party would be granted the right to use the image lay with the agency; there was no obligation to conclude contracts. In such a situation, from the Chamber’s perspective, there is much to suggest that the original author may rely on a reservation of use declared by his licensee under § 44b (3) UrhG when asserting his remaining rights to prohibit use.
(2) The defendant’s objection that the prohibition on use declared in the agency’s General Terms and Conditions with respect to its customers for web crawlers could not be formulated "in relation to § 44b (3) UrhG" for temporal reasons is irrelevant. For the legal effect of the declaration It is not a requirement that the declaration be made with specific reference to a particular legal provision.
(3) The reservation is also formulated with sufficient clarity. Article 4 (3) of the DSM Directive requires an explicit declaration of the reservation of use. This requirement for explicitness must therefore be taken into account in a directive-compliant interpretation of § 44b (3) UrhG (as also stated in the Explanation in the Government Draft, Bundestag Printed Matter 19/27426, p. 89). Consequently, the declared reservation must be made both expressis verbis (not impliedly) and with such precision (specific and individualized) that it unequivocally covers a particular content and a specific use (Hamann, ZGE 16 (2024), p. 134). The reservation of use formulated on the website of the photo agency XXX meets these requirements without difficulty.
To the extent that it is argued that a reservation of use declared for all works uploaded on a website would violate the explicitness requirement of § 44b (3) UrhG (thus expanding the abstract reasoning of Hamann, ibid., p. 148), this is not convincing. Even a reservation explicitly declared for all works uploaded on a website is clearly definable in its scope and content and is therefore explicitly declared.
(4) Finally, there is much to suggest that the reservation of use meets the requirements for machine readability within the meaning of § 44b (3) sentence 2 UrhG.
While the term “machine readability” must be interpreted in light of the legislative intent underlying it—to enable automated queries by web crawlers (see Explanation in the Government Draft, Bundestag Printed Matter 19/27426, p. 89)—it should indeed be understood in the sense of "machine understandability" (extensively discussed by Hamann, ibid., pp. 113, 128 ff.).
The Chamber, however, tends to consider a reservation of use expressed solely in "natural language" as "machine-understandable" (contrary to the prevailing opinion in academic literature, see Hamann, ibid., pp. 131 ff., 146 ff., with references to the prevailing view, where reference is also made to a contribution by the defendant’s representatives, namely Akinci/Heidrich, IPRB 2023, pp. 270, 272, who apparently support the Chamber’s view; however, the Chamber did not have direct access to this contribution until the drafting of the judgment). However, the question of whether and under what specific conditions a reservation of use expressed in “natural language” can also be considered “machine-understandable” must always be answered based on the technical developments prevailing at the relevant time of use of the work.
Accordingly, the European legislator has also stipulated within the framework of the AI Regulation that providers of AI models must implement a strategy, particularly for identifying and compliance with a rights reservation asserted under Article 4 (3) of the DSM Directive “even through the use of state-of-the-art technologies” must be ensured (Article 53 (1) lit. c of the AI Regulation). These "state-of-the-art technologies" undoubtedly include, in particular, AI applications capable of comprehending text written in natural language (this is evidently also the view of the defendant’s representatives, Akinci/Heidrich, in the contribution IPRB 2023, 270, 272, which was not directly accessible to the Chamber but is cited by Hamann, ibid., p. 148, who generally affirms this possibility from a technical standpoint, ibid.). Everything points to the fact that the legislator of the AI Regulation had such AI applications in mind when referencing “state-of-the-art technologies.”
An objection to this view is that it leads to a circular argument: if it is required that the operator of text and data mining must use AI applications to check whether a reservation of use has been declared, then this AI-based search itself would require pattern analysis, which would already constitute text and data mining within the meaning of § 44b (1) UrhG; in other words, the application of the exception would decide the legality of its own application (Hamann, ibid., p. 148). The Chamber does not share this assessment: the copyright-relevant use that requires justification is not, contrary to the above view, the conduct of “pattern analysis” itself, but rather the reproduction of the copyright-protected work within the meaning of § 16 UrhG. The notion that the prior identification of such works on the internet and their verification to see if reservations have been declared within the meaning of § 44b (3) sentence 2 UrhG necessarily requires an additional layer of text and data mining within the meaning of § 44b (1) UrhG does not appear compelling, as one could particularly think of webpage content processing through the use of web crawlers, where only ephemeral and incidental reproductions are made, which are themselves already justified under § 44a UrhG.
Furthermore, it is argued against the Chamber’s broader interpretation of the term “machine readability” that this term is understood more narrowly by the European legislator in a different context. Reference is made here to Recital 35 of the PSI Directive (Directive (EU) 2019/1024), which, for “machine readability” within the meaning of this Directive, requires “simple” recognizability, among other things (BeckOK UrhR/Bomhard, 42nd Ed. 15.2.2024, UrhG § 44b margin no. 31 with further references); this requirement cannot be met by a reservation expressed solely in natural language. Such an argument, however, presumes that the terminology of both directives must be understood in the same way. The Chamber has doubts as to whether such an equivalence of terms is convincing, as the directives have different objectives: While the PSI Directive concerns the purely unilateral access of the public to information and the purely unilateral obligation of public authorities to publish certain information, Article 4 (3) of the DSM Directive aims to balance the interests of users of text and data mining (to be able to conduct it as simply and legally securely as possible) and the interests of rights holders (to secure their rights as simply and effectively as possible). In the Chamber’s view, this balance of interests cannot be resolved unilaterally in favor of text and data mining users by deeming only the simplest technical solution for them to be sufficient for the validity of a declared reservation of use. Such an understanding would also contradict the intention of the legislator of the DSM Directive, who, in Recital 18, does not demand that a reservation be declared “in the simplest way possible,” but rather “in an appropriate manner.” Likewise, the German legislator, when implementing the directive, requires only a declaration made “in a manner appropriate to the automated processes used in text and data mining” (Explanation in the Government Draft, Bundestag Printed Matter 19/27426, p. 89).
Furthermore, it would, in the Chamber’s view, present a certain inconsistency in valuation to allow the development of increasingly powerful text-understanding and text-generating AI models through the exception rule in § 44b (2) UrhG, while simultaneously not requiring the use of already existing AI models within the limitation rule in § 44b (3) sentence 2 UrhG.
Although the plaintiff has not yet demonstrated to what extent sufficient technology for the automated semantic recognition of the disputed reservation of use was available at the time of the reproduction act in 2021, the plaintiff has only referred to services available in 2023 (Reply, p. 14 ff., sheet 48 ff. of the case file). However, there are indications that the defendant already had access to suitable technology. According to the defendant’s own submissions, the analysis conducted in the context of creating the dataset XXX, in the form of comparing image content with pre-existing image descriptions, clearly required semantic recognition of these image descriptions by the software used. In this context, there is much to suggest that—particularly for the defendant—systems were already available in 2021 that were capable of recognizing a reservation of use formulated in natural language in an automated manner.
3.
However, the defendant can rely on the exception rule under § 60d UrhG for the reproduction act in dispute.
According to this provision, reproductions for text and data mining for the purpose of scientific research are permitted by research organizations.
a)
The reproduction was carried out—for the purpose of text and data mining within the meaning of § 44b (1) UrhG, as outlined above. Furthermore, it was also done for the purposes of scientific research within the meaning of § 60d (1) UrhG.
Scientific research generally refers to the methodical and systematic pursuit of new knowledge (Spindler/Schuster/Anton, 4th edition 2019, UrhG § 60c margin no. 3; BeckOK UrhR/Grübler, 42nd edition 1.5.2024, UrhG § 60c margin no. 5; Dreier/Schulze/Dreier, 7th edition 2022, UrhG § 60c margin no. 1). The term “scientific research” is not to be understood narrowly, as it already considers the methodical and systematic “pursuit” of new knowledge to be sufficient. It does not only encompass the steps directly linked to the generation of new insights; rather, it suffices if the step in question is aimed at achieving a (later) knowledge gain, as is the case with many data collections, which must first be carried out to later draw empirical conclusions. Specifically, the term “scientific research” does not require a successful research outcome.
Thus, contrary to the plaintiff’s view, the creation of a dataset of the type in question, which can serve as a basis for training AI systems, can certainly be considered scientific research in the above-mentioned sense. Although the creation of the dataset itself may not yet be associated with a knowledge gain, it is a fundamental step aimed at using the dataset for the purpose of later knowledge acquisition. Such an objective can be affirmed in the present case. It is sufficient that the dataset was—undisputedly—published for free and thus made available, particularly to researchers working in the field of artificial neural networks. Whether the dataset is also used by commercial enterprises for training or further development of their AI systems, as the plaintiff claims regarding the services XXX and XXX, is irrelevant, because even research conducted by commercial enterprises is still research—although not privileged as such under §§ 60c ff. UrhG. Therefore, the disputed question between the parties as to whether the defendant, beyond the creation of such datasets, also engages in scientific research in the form of developing its own AI models is not relevant in this context.
b)
The defendant also does not pursue commercial purposes within the meaning of § 60d (2) no. 1 UrhG.
For determining whether research is non-commercial, only the specific nature of the scientific activity is relevant, while the organization and funding of the institution where the research takes place are irrelevant (Recital 42 of the InfoSoc Directive).
The non-commercial purpose pursued by the defendant in relation to the creation of the disputed dataset XXX is already evident from the fact that the defendant, undisputedly, made this dataset freely available to the public. There is no indication, nor has the plaintiff presented any evidence, that the development of the disputed dataset would serve, at least in part, to develop the defendant’s own commercial offering (see BeckOK IT-Recht/Paul, 14th Ed. 1.4.2024, UrhG § 60d margin no. 10 regarding this criterion). The fact that the disputed dataset may also be used by commercially active companies for training or further development of their AI systems is irrelevant for determining the nature of the defendant’s activity. The mere fact that some members of the defendant organization are also engaged in paid activities at such companies, in addition to their work for the association, is not sufficient to attribute these companies’ activities to the defendant as its own.
c)
The defendant is also not precluded from invoking the exception rule under § 60d UrhG based on § 60d (2) sentence 3 of the provision.
According to this, research organizations that collaborate with a private company that has a determining influence on the research organization and preferential access to the results of the scientific research cannot rely on the exception rule of § 60d UrhG. The burden of presenting evidence for the factual prerequisites of the counter-exclusion under § 60d (2) sentence 3 UrhG lies, according to the wording of the norm, with the plaintiff. (1) To the extent that the plaintiff initially argued in the reply that the company XXX has direct influence over the defendant through the funding of the dataset in question and the placement of “key positions” at the defendant organization with its own employees (Reply, p. 18, sheet 52 of the case file), this argument lacks substance.
In this regard, the plaintiff merely points out that one of the co-founders of the defendant organization, Mr. XXX, is employed by XXX as “Head of Machine Learning Operations” and that another member of the defendant organization, Mr. XXX, is employed there as a “Research Scientist” (Reply, p. 4 ff., sheet 38 ff. of the case file). However, the employment of two association members at the company XXX alone does not prove a determining influence of this company on the research work of the defendant.
Moreover, the plaintiff has not even claimed that the defendant granted the company XXX preferential access to the results of its scientific research, specifically the disputed dataset. Rather, it is merely stated that XXX trained its service XXX using the disputed dataset (Reply, p. 8 ff., sheet 42 ff. of the case file).
(2) To the extent that the plaintiff, in the written submission dated July 3, 2024, refers to a chat on the platform XXX that took place in 2021, in which the co-founder of the defendant organization, Mr. XXX, allegedly expressed his willingness to grant the company XXX early access to the (then smaller) dataset in exchange for a funding contribution of $5,000, this submission also does not meet the exception criteria under § 60d (2) sentence 3 UrhG.
It can remain undecided whether this chat exchange—which the defendant has not disputed as such (see submission dated July 9, 2024, p. 3, sheet 112 of the case file)—actually supports the interpretation drawn by the plaintiff. Likewise, it can remain undecided whether such a willingness to grant early access—even if such access was actually provided, which the plaintiff has not alleged—would suffice to constitute preferential access to research results within the meaning of § 60d (2) sentence 2 UrhG.
For, in any case, it has neither been demonstrated by the plaintiff nor is it otherwise apparent that the company XXX had a determining influence over the defendant organization. To the extent that any personnel connections between the defendant and companies in the AI industry have been shown, they relate to the companies XXX and XXX (Reply, p. 4 ff., sheet 38 ff. of the case file).
II.
The decision on costs is based on § 91 (1) and § 91a (1) sentence 1 of the German Code of Civil Procedure (ZPO).
The decision on provisional enforceability is issued in accordance with § 708 no. 11, §§ 711, 709
ZP
Hartmann Presiding Judge Dr.
Kanzler Judge
Dr. Woodtli Judge