A Short Introduction to Datasets, Licensing and the Current EU Copyright Reform (by Dimitra Iordanidou)

In the digital environment, datasets are used in the everyday life of many people for various different activities and sectors. For instance, in the field of science the use of datasets has contributed in the development of research through facilitating the collection and organization of data. In order to understand the importance of datasets, a proper definition of their scope seems necessary. A dataset is any organized collection of data defined by a theme or category that reflects what is being measured/observed/monitored. It may include photographs created during a research, an archive of material relating to a particular topic, a film that has been created or a collection of interview scripts.

Realizing the breadth of datasets nowadays, there are common concerns about the way datasets can be protected by law. In this regard, the datasets can be protected through intellectual property law (for example if they contain original literary, musical, dramatic and artistic works, they may be protected through copyright law) and may also be granted specific database rights. So what if a person wants to use a dataset created by another person? The answer to the latter is through a contract with the dataset owner, commonly refer to as a ‘license’.

There are many types of licenses available. However, considering the key role of datasets for various institution and researchers, the Open Access movement has encouraged a free access and distribution of the work contained in datasets and their derivative works, subject to proper attribution of authorship. In order for these licenses to be produced, there is a global non-profit organisation, Creative Commons, which offers three types of licenses compatible with Open Access: the CC0 1.0 (a waiver), the CC-BY v.4.0 (attribution of the original owner), CC-BY-SA v.3.0 (attribution for the dataset and its derivatives). Additionally, there is an Open Access license, OGL v.3.0. which is an Open Government License that requires the acknowledgement of original sources for public sector information. Finally, if the dataset includes a database, an ODC-BY v.1.0 license is available to apply to the structure of a database.

The digital activities, however, are also regulated by the Copyright Law. In this area, the upcoming and long debated EU Copyright Reform -as part of the general Digital Single Market Strategy of the European Union-is going to impact heavily the way Copyright will look in the future. Although the text of the new Copyright Directive is constantly changing during the complex legislative process of the EU, there are three articles included in the Commission’s Proposal of September 2016 which seem highly problematic.

Firstly, the proposed article 3 (text and data mining), expects copyright to protect the information extracted by computers ignoring the fact that copyright has no such experience and does not offer a justification to extract value from data and statistics. So the exemption that is proposed should not probably be included in the field of Copyright Law. Moreover, the proposed article 11 (press publishers’ right) is based upon the concept that value created in the EU is not appropriated by the person who typed it but by the news aggregators (usually US platforms).

However, adding just another right, requiring an extra payment, is probably going to fail since even legal uses would have to pay for a second time. Finally, the most controversial article of the aforementioned Copyright Reform is the article 13 (intermediaries’ liability for secondary infringement), based upon the so called ‘value gap’ or in other words, the gap between the revenues of the intermediaries, generated by the use of copyright- protected material and the revenues of rightholders. The current legal regime (E-Commerce Directive, articles 12-15) of intermediary liability offers to certain internet intermediaries, under specific conditions, a safe harbor provision. Under the latter, the intermediaries are not liable for the wrongful activities initiated by their users. The above delicate balance has achieved a shared –and fair- distribution of responsibility between rightholders and internet intermediaries which will be change radically if the proposed article 13. Imposing a heavy burden to intermediaries will not only diminish the participation of online users to platforms but can lead to limit fundamental rights, such as the freedom of expression and the right to information. Since this area of law is heavily lobbied, a special attention is needed in order to save the Internet!