• What is TDM?

    TDM stands for “Text and Data Mining”, which is the automated processing of large amounts of structured digital content for purposes of information retrieval, extraction, interpretation, and analysis.  It is designed to understand the unprotected facts, ideas, and concepts contained in text and other works.

  • What is Machine Learning?

    Machine learning is the ability of computers to learn without being explicitly programmed, which involves the recognition of patterns in data that has been processed by TDM.

  • Why is TDM so important to researchers?

    TDM aids research by tasking computers with the job of sifting through and analysing data contained in a variety of works, without being told where to look. This lets researchers more productively identify and anticipate relationships, patterns, knowledge, trends and hidden insights.

  • Why is TDM important to business?

    Startups and businesses of all kinds are increasingly using machine learning to develop algorithms to both learn from and predict data to understand business trends, research new markets, and develop new technologies such as Artificial Intelligence applications (which are heavily reliant upon text data mining).

  • What are examples of beneficial innovations currently enabled by TDM?

    Health and Medicine: aiding vaccination efforts by using online media data to identify clusters of anti-vaccination sentiment, using TDM across science research journals to help develop individualised cancer treatment strategies, predicting disease outbreaks by analysing online news media and other data. 

    Science: crawling Reddit postings to analyse how complex social networks evolve and change.

    Public Interest: assessing specific community well-being by crawling and analysing online hyperlocal posts and open data sources.

    Public Safety: using TDM applied to online consumer reviews for early identification of unsafe products, crawling online government data to visualize public threats over a defined time period, identifying accessible sidewalk routes for the impaired mobility travellers by crawling online maps and online posts.

    Transportation: training autonomous cars to avoid collisions, developing mobile apps that can enhance discovery of safe footpaths and pedestrian access ways.

    Business: making predictions about the growth of project categories by analysing online user comments and discussion groups (e.g. Reddit, Twitter, Yelp), identifying and analysing changes in brand sentiment by evaluating online social media posts.

  • What kinds of materials are useful for TDM and Machine Learning?

    All types of material perceptible to humans – text, data, images, video, sound recordings – can be analysed by machines to improve our understanding of the facts and ideas in them. Text and video are helpful for voice recognition, images and video are helpful for visual recognition, text and data-sets are helpful for obtaining insights and machine analysis.

  • Why is it necessary to access large volumes of copyrighted works?

    Machine learning requires large amounts of material to understand context, increase prediction accuracy, obtain the widest possible insight from the vast amount of the world’s information, and create algorithms that are robust enough to model human perception and decision making.

  • Why aren’t licenses a reasonable solution for commercial TDM?

    Several reasons: first, TDM is like reading, and reading has never required a license from copyright owners, because copyright does not cover facts, ideas and concepts in works, and in practicality, very few of these licenses are available. It’s important that a TDM exception is written in a manner that will not undermine TDM of freely available content. Second, text and data come from thousands of different sources, and there is very rarely any clear identification of what is protected and who owns it. So, imposing a licensing requirement on TDM research would require negotiation of hundreds of thousands of potential licenses from unidentified owners around unclear rights. This would grind research to a halt, and create the possibility of abusive copyright litigation against those engaged in research. Third, the efforts to license TDM so far have been sparse and in very narrow categories, not nearly expansive enough for useful research across many different fields.

  • Won’t TDM exceptions undermine licensing markets?

    No, because they would not permit anyone to use illegal copies of copyrighted works. In addition, experience has shown that licenses for TDM have been granted only in very narrow and specific fields across a relatively small number of publications, not nearly broad enough to support the type of research that is needed.  Imposing a license requirement on all copyrighted works to protect a small fraction of rights holders imposes unsustainable transaction costs on researchers.

  • What’s wrong with limiting TDM exceptions to public-interest research?

    A limited exception: creates an artificial boundary for innovation; chills potential innovative research by the private sector; ignores how modern research projects are initiated; creates substantial barriers to public-private research cooperation; and sends a strong signal to startups, businesses, and private researchers to engage in TDM projects outside of Europe.

    Ironically, public interest research organizations have been encouraged by national authorities and European institutions to establish partnerships with the private sector, to counter limited public resources and increase opportunities for students, researchers, and startups.  A limited exception in a burgeoning area of science and technology impedes these goals.

  • How does TDM implicate copyright?

    Text and data mining may require automated, incidental storage of copyrighted works to access non-copyrightable information. However, the process and results of TDM do not implicate the underlying expressive value of the copyrighted work, and do not interfere with economic value or business models associated with publications.

  • What is “lawful access?”

    Lawful access means that a work has been made available to a researcher via a subscription, purchase, or has been made otherwise publicly accessible by the conduct of the copyright owner.

  • How are other countries dealing with TDM?

    Most other countries focused on copyright reform in the digital age have recognised or are considering broad text and data mining exceptions to help spur the pace of innovation and research. Singapore has proposed an unrestricted exemption for text and data mining, and Australia’s Law Reform Commission has recommended that Australian reform of Copyright law consider non-expressive uses such as text and data mining under a proposed fair use standard. United States courts have repeatedly recognised that non-expressive use of materials such as text and data mining constitute permissible fair use.