Unleashing Big Data’s Potential for journalism, economy and research

Wed, Apr 04, 2018

Text and Data Mining (TDM) lets us make sense of the vast amount of data that is out there. Understanding this data is critical to advancing our knowledge in climate change research, breaking corruption scandals in the press, discovering breakthrough medical treatments and training computers to improve customers’ experience online.

However, there are concerns that the current reform of EU copyright rules could limit who gets to use TDM and how they get to use it.

In order to build strong scientific datasets, or to train our Artificial Intelligence (AI) algorithms, researchers need to gather data from a broad range of sources, including scientific publications to which we have acquired lawful access through licensing agreements, or data that is publicly available on the internet (and not behind a paywall). We need to make sure that our right to read this data includes the right to understand and analyse it.

We want EU policymakers to understand that TDM is not about copying or re-using creative works without paying. TDM is about understanding the works we have legally accessed to identify patterns, facts, and correlations locked within these works, such as the tone of scientific or journalistic articles or how many times specific words are used. TDM does not harm rightsholders. In fact, the more data analytics that take place, the more TDM users will request lawful access, increasing the demand for subscriptions to articles.

Many research projects are public-private partnerships. In fact, the European Commission’s Horizon 2020 programme – the largest research programme globally – envisages collaboration between public and private entities as they take “great ideas from the lab to the market”. This programme usually requires that approved projects have another source of funding, typically private funding. If the private partner of a Horizon 2020 funded consortium cannot use TDM on the same basis as a public partner, this would greatly restrict the ability to fund AI projects at a time when such research is a critical element of growing the EU’s digital economy.

Journalists also do not qualify as non-commercial beneficiaries, yet today, they need to have tools to understand the increasing amount of information at their disposal. TDM technologies have helped uncover crucial stories with significant impact on society and democracy, such as the Panama Papers. With the growing threat of fake news, which we know can be best tackled by algorithms and data analytics tools, we should not undermine the quality of journalism in Europe by raising unjustified copyright barriers.

Being able to verify the data used in AI is critical to understanding and addressing errors, bias, and needed improvements. Building adequate datasets is the first step in conducting a TDM-based research project and this can take several weeks, sometimes months. Access to datasets once the research is completed is necessary to verify any findings. But to do that, we need to be able to safely store incidental copies of the datasets on secure servers. However, this is something that is not allowed in the copyright reform as it stands today. Without any backup information that would allow the public to verify research conducted in Europe, we risk losing citizens’ trust in science.

Like the European Commission, we have big ambitions for Europe when it comes to Artificial Intelligence. We also want Europe to lead the global AI agenda and adopt a future-proof copyright reform that will unleash big data’s potential for journalism, economy and research in Europe.

To achieve this ambition, we need an equally ambitious TDM exception.

About the authors
  • Alan Akbik is a research scientist at Zalando Research, working mostly on natural language processing
  • Ari Asmi is a researcher at the Institute for Atmospheric and Earth System Research at the University of Helsinki
  • Adriana Homolova is an investigative journalist, leader of the Elvis, Map me tender project
  • Michèle B. Nuijten is an assistant Professor at the Meta-Research Centre at the University of Tilburg
  • Philipp-Andreas Schmidt is Government Affairs Manager at Bayer AG

This op-ed is based on the event of the same name, which took place in the European Parliament on 20 February, and was initially published in the Parliament Magazine

Other blog posts

  • 24 organisations urge Rapporteur Axel Voss MEP to strike a more ambitious deal on TDM

    Fri, Jun 08, 2018

    24 organisations express their deepest concerns about the second version of the draft com promise amendments on Text and Data Mining - TDM (Article 3)

  • EARE’s position on the Bulgarian Presidency’s compromise text on the copyright Directive

    Wed, Mar 28, 2018

    Read EARE's statement on the latest Bulgarian Presidency Compromise on the Copyright directive.

  • EARE’s position on the COREPER agreement on the Copyright Directive

    Thu, May 31, 2018

    Read EARE's statement on the COREPER agreement on the Copyright Directive.