The automated retrieval console (ARC): open source software for streamlining the process of natural language processing

  • Authors:
  • Leonard W. D'Avolio;Thien Nguyen;Louis Fiore

  • Affiliations:
  • VA Boston Healthcare System, Jamaica Plain, MA, USA;VA Boston Healthcare System, Jamaica Plain, MA, USA;VA Boston Healthcare System, Jamaica Plain, MA & Boston University, Boston, MA, USA

  • Venue:
  • Proceedings of the 1st ACM International Health Informatics Symposium
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Open source natural language processing (NLP) frameworks have made it easier for NLP developers and researchers to develop more reusable and modular components and to capitalize on the work of others. With the Automated Retrieval Console (ARC) we attempt to build upon this foundation by streamlining the many processes surrounding the development, evaluation, and deployment of natural language processing technologies. Toward this end, ARC offers graphical user interfaces to facilitate corpus import, reference set creation, annotation, and inter-annotator agreement calculation. To speed task-specific information extraction development, ARC combines NLP-generated features from UIMA pipelines with machine learning classifiers and calculates performance statistics against a reference set. We also use ARC to explore automated algorithm creation for specific information extraction tasks in an effort to reduce the need for custom code and rules development. We present a detailed description of the ideas implemented in this proof-of-concept and a brief overview of two empirical evaluations.