Advances in deep parsing of scholarly paper content

  • Authors:
  • Ulrich Schäfer;Bernd Kiefer

  • Affiliations:
  • Language Technology Lab, German Research Center for Artificial Intelligence, Saarbrücken, Germany;Language Technology Lab, German Research Center for Artificial Intelligence, Saarbrücken, Germany

  • Venue:
  • NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on advances in deep linguistic parsing of the full textual content of 8200 papers from the ACL Anthology, a collection of electronically available scientific papers in the fields of Computational Linguistics and Language Technology. We describe how - by incorporating new techniques - we increase both speed and robustness of deep analysis, specifically on long sentences where deep parsing often failed in former approaches. With the current open source HPSG (Head-driven phrase structure grammar) for English (ERG), we obtain deep parses for more than 85% of the sentences in the 1.5 million sentences corpus, while the former approaches achieved only approx. 65% coverage. The resulting sentence-wise semantic representations are used in the Scientist's Workbench, a platform demonstrating the use and benefit of natural language processing (NLP) to support scientists or other knowledge workers in fast and better access to digital document content. With the generated NLP annotations, we are able to implement important, novel applications such as robust semantic search, citation classification, and (in the future) question answering and definition exploration.