Advances in deep parsing of scholarly paper content

Authors:
Ulrich Schäfer;Bernd Kiefer
Affiliations:
Language Technology Lab, German Research Center for Artificial Intelligence, Saarbrücken, Germany;Language Technology Lab, German Research Center for Artificial Intelligence, Saarbrücken, Germany
Venue:
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Year:
2009

Citing 7
Cited 2

A compact architecture for dialogue management based on scripts and meta-outputs

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Extracting and Querying Relations in Scientific Papers

KI '08 Proceedings of the 31st annual German conference on Advances in Artificial Intelligence
Middleware for creating and combining multi-dimensional NLP markup

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
Scientific authoring support: a tool to navigate in typed citation graphs

CL&W '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
Constraining robust constructions for broad-coverage parsing with precision grammars

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
TAKE Scientist's Workbench: Semantic Search and Citation-Based Visual Navigation in Scholar Papers

ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing

The ACL Anthology Searchbench

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Extracting glossary sentences from scholarly articles: a comparative evaluation of pattern bootstrapping and deep analysis

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report on advances in deep linguistic parsing of the full textual content of 8200 papers from the ACL Anthology, a collection of electronically available scientific papers in the fields of Computational Linguistics and Language Technology. We describe how - by incorporating new techniques - we increase both speed and robustness of deep analysis, specifically on long sentences where deep parsing often failed in former approaches. With the current open source HPSG (Head-driven phrase structure grammar) for English (ERG), we obtain deep parses for more than 85% of the sentences in the 1.5 million sentences corpus, while the former approaches achieved only approx. 65% coverage. The resulting sentence-wise semantic representations are used in the Scientist's Workbench, a platform demonstrating the use and benefit of natural language processing (NLP) to support scientists or other knowledge workers in fast and better access to digital document content. With the generated NLP annotations, we are able to implement important, novel applications such as robust semantic search, citation classification, and (in the future) question answering and definition exploration.