You are here

Journal Content Mining

The Royal Society of Biology recognises the wealth of data within academic journal content and other scholarly resources and the potential for it to be used in innovative ways to further research.  

The Society considers that text and data mining has the potential to:

  • be a cost-effective way of rapidly advancing UK research
  • facilitate improvements of practice in vital areas such as human, veterinary and environmental health
  • generate new kinds of academic output that will enhance the UK economy.

The Society supports the call of researchers for using text and data mining techniques to further their research and wishes to see this made possible with minimal barriers. The Society therefore welcomes the new exceptions to UK copyright law (effective 1 June 2014) that allow those who have legally acquired access to content to mine it for non-commercial purposes without further barriers.

The Society has identified several additional points that it considers especially pertinent to this subject:

  1. There may be technical issues that need to be resolved on some systems. One issue may be that strain could be placed on existing journal platforms by large numbers of web crawlers accessing and copying data. Publishers need to evaluate this and provide pragmatic solutions if these are indeed necessary.
  2. Another technical issue is that of inter-operability. At present, different programming is required to interrogate different publishers’ platforms. The CrossRef Text and Data Mining Service, which supports standard Application Programming Interfaces (APIs) and data representations to enable a common content mining system for both researchers and publishers across a number of publications, is a positive initiative for accessing articles and receiving full text (if entitled). Further work is encouraged to improve researchers’ ability to manipulate and use the text easily once the articles have been received.
  3. Text and data mining should not be used to undermine journals by reproducing, for example, whole articles. It is reassuring that the new copyright exceptions make clear that copyright law on quotations applies in this context, which would limit the amount of text that can be reproduced directly.
  4. To ensure traceability and integrity any new published output that uses mined text should contain reference links to the original article from which the text was extracted. The Royal Society of Biology welcomes the fact that the new copyright exceptions state that the original work should be acknowledged unless impractical.
  5. Given that the law does not permit licensing that limits taking advantage of the new copyright exceptions, publishers who are involved in developing licenses for text and data mining are encouraged to formulate these licences in ways that make text and data mining processes as simple as possible.

This statement was developed by the Research Dissemination Committee of the Royal Society of Biology which is pleased to be identified as the author of this paper. For more information, contact policy@rsb.org.uk