About
This page is curated by Christian Peukert. Any questions or comments are welcome.
Method
Overview
We have constructed a comparative dataset of copyright exceptions relevant to research uses, including private study, fair use or fair dealing, institutional copying, and text and data mining (TDM). The objective is to identify and classify the statutory evolution of these exceptions across jurisdictions from the law in force on 1 January 1990 through the law in force in 2025. Our project can be seen as an extended version of Fiil-Flynn et al. (2022), using automated data collection to enable time-variant comparative analysis of copyright law.
The methodological approach combines:
- Comparative legal coding based on the typology of research exceptions developed by Flynn et al. (2022).
- Large-scale statutory extraction and annotation using a large language model (LLM).
- Structured verification rules to ensure that only primary statutory sources are used.
The resulting dataset records the timeline of relevant statutory provisions, their textual content, and their classification according to the openness of the research exception.
Data Collection
Jurisdictional Coverage
The unit of analysis is a national jurisdiction. For each jurisdiction, we construct a statutory timeline beginning with the copyright law in force on 1990-01-01, followed by all subsequent amendments relevant to research-related exceptions.
Only statutory provisions governing the following topics are included:
- research use
- private study
- fair use
- fair dealing
- text and data mining
- reproduction for research purposes
- sharing or communication of research copies
- institutional copying for research
Events unrelated to these topics are excluded.
Source Selection
Statutory texts are collected exclusively from primary legal sources. Accepted sources include:
- official national legislative databases
- official government gazettes
- official parliamentary or ministry websites hosting statutory texts
- the WIPO Lex legal database when it provides the actual statutory text or links to the official law
Secondary materials such as case law summaries, blog posts, academic commentary, or legal guides are not used as sources of statutory text.
When multiple language versions exist, the primary legislative language of the jurisdiction is used. Where necessary, the official English translation is recorded if provided by the source. If no official translation exists, an unofficial translation is produced for documentation purposes.
Statutory Timeline Construction
For each jurisdiction, we identify three categories of events:
-
Baseline event
The copyright statute verified to be in force on 1 January 1990. -
Relevant updates
Subsequent statutory amendments that:- introduce a new research-related exception
- remove or repeal an exception
- expand or narrow the scope of an exception
- materially alter the wording in ways that affect legal scope
-
Current law confirmation
A final entry confirming the statutory provisions governing research exceptions in 2025.
All events are dated using the date the law entered into force, not the date of enactment.
Identification of Research-Relevant Exceptions
Before classifying a jurisdiction, the full set of potentially relevant statutory provisions is identified. These may include:
- general fair use clauses
- fair dealing for research or private study
- specific research exceptions
- private copying or private use exceptions
- library and institutional copying provisions
- educational exceptions covering research
- TDM-specific provisions
- quotation exceptions
- provisions implementing the Berne three-step test
Each provision is reviewed to determine whether it could authorize reproduction or other uses of copyrighted works in the course of research.
When multiple exceptions exist simultaneously, the jurisdiction is classified according to the most permissive exception applicable to research uses.
Classification of Research Exceptions
The legal classification follows the typology developed by Flynn et al. (2022), which evaluates the openness of copyright exceptions across three dimensions:
-
Uses
Whether the exception applies only to reproduction or also to communication, distribution, or other exclusive rights. -
Works
Whether the exception applies to all categories of works or excludes certain types. -
Users
Whether the exception applies to any user or only to individuals or institutions.
Based on these dimensions, each jurisdiction is assigned one of six categories.
| Color | Definition |
|---|---|
| GREEN | Fully open exception allowing reproduction and sharing of works for research by any user |
| BLUE | Reproduction of full works allowed for research, but sharing not permitted |
| LIGHT BLUE | Reproduction allowed only for private or personal research by individuals |
| PURPLE | Reproduction allowed only for institutions such as libraries or research organizations |
| ORANGE | Full reproduction allowed but limited to certain categories of works |
| RED | Only excerpts or quotations allowed; full-work reproduction not permitted |
The classification reflects the most permissive research-relevant exception in force at that time.
This approach follows Flynn et al. (2022), who evaluate the openness of copyright exceptions according to their applicability to any use, any work, and any user.
LLM-Based Legal Annotation
Model
Statutory extraction and annotation are performed using a state-of-the-art LLM with websearch capabilities.
The model is used to assist with:
- identifying relevant statutory provisions
- extracting verbatim statutory text
- constructing structured legal timelines
- classifying exceptions according to the Flynn et al. typology
Prompt Design
The model is guided by a structured legal-history extraction prompt. The prompt requires the model to:
- Search for and retrieve statutory texts from primary legal sources.
- Identify the baseline copyright law in force in 1990.
- Detect all subsequent amendments affecting research-related exceptions.
- Extract the statutory language verbatim.
- Produce a structured timeline in JSON format.
Strict constraints are imposed to minimize hallucination:
- The model must rely only on primary statutory texts.
- Non-statutory sources such as commentary or case law are disallowed.
- Dates must correspond to entry into force, not enactment.
- If a statutory text cannot be verified, the model must explicitly state the missing information.
Reliability Measures
Several design features improve reliability:
-
Source hierarchy restrictions
Only official statutory texts are accepted. -
Explicit uncertainty reporting
When statutory information cannot be verified, the dataset records the missing information. -
Full exception inventory
Classification is based on the most permissive exception in the entire statute, not only the provision introduced by an amendment. -
Verbatim text extraction
Statutory language is copied directly from the legal source to allow later auditing.
Limitations
This methodology faces several limitations.
First, statutory interpretation can vary across jurisdictions. The classification therefore reflects the statutory text on its face, rather than judicial interpretation.
Second, machine translation may introduce minor inaccuracies when statutes are not available in English.
Third, some historical statutes are not digitally available, which may limit verification of the exact text in force in 1990.
Despite these limitations, the combination of structured legal prompts, strict source requirements, and the Flynn typology provides a scalable method for comparative analysis of research exceptions across jurisdictions.
References
Flynn, S., Schirru, L., Palmedo, M., & Izquierdo, A. (2022). Research exceptions in comparative copyright.
Program on Information Justice and Intellectual Property Research Paper Series No. 75.
https://digitalcommons.wcl.american.edu/research/75
Fiil-Flynn, S. M., Butler, B., Carroll, M., Cohen-Sasson, O., Craig, C., Guibault, L., ... & Contreras, J. L. (2022). Legal reform to enhance global text and data mining research. Science, 378(6623), 951-953.
https://www.science.org/doi/10.1126/science.add6124