GetItFull – a tool for downloading and pre-processing full-text journal articles

  • Authors:
  • Jeyakumar Natarajan;Cliff Haines;Brian Berglund;Catherine DeSesa;Catherine J. Hack;Werner Dubitzky;Eric G. Bremer

  • Affiliations:
  • Bioinformatics Research Group, University of Ulster, UK;CTH Technologies, Inc., Oak Brook Terrace, IL USAPSS, Inc, Chicago, IL;CTH Technologies, Inc., Oak Brook Terrace, IL USAPSS, Inc, Chicago, IL;Brain Tumor Research Program, Children's Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL;Bioinformatics Research Group, University of Ulster, UK;Bioinformatics Research Group, University of Ulster, UK;Brain Tumor Research Program, Children's Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL

  • Venue:
  • KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automated analysis of full-text life science research articles and technical documents is becoming increasingly important. In contrast to abstracts, accessing and processing full-text is considerably more complex. GetItFull is a tool for downloading and pre-processing full-text journal articles. GetItFull automatically connects to a journal's Web site, downloads the journal content and performs various commonly used pre-processing steps. The output comprises a structured XML document for each article with tags identifying the various sections and journal information. The output may then be used as the basis for text mining applications or exported to a database for further processing.