TTP: a fast and robust parser for natural language

Authors:
Tomek Strzalkowski
Affiliations:
New York University, New York, NY
Venue:
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Year:
1992

Citing 8
Cited 13

Word association norms, mutual information, and lexicography

Computational Linguistics
Studies in part of speech labelling

HLT '91 Proceedings of the workshop on Speech and Natural Language
Natural Language Information Processing: A Computer Grammmar of English and Its Applications

Natural Language Information Processing: A Computer Grammmar of English and Its Applications
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Parse fitting and prose fixing: getting a hold on ill-formedness

Computational Linguistics - Special issue on ill-formed input
Structural ambiguity and lexical relations

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Information retrieval using robust natural language processing

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics

Natural language information retrieval in digital libraries

Proceedings of the first ACM international conference on Digital libraries
Towards an Automated Citation Classifier

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Robust text processing in automated information retrieval

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Fast statistical parsing of noun phrases for document indexing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Linguistic knowledge acquisition from parsing failures

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A syntax-based part-of-speech analyser

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Information retrieval using robust natural language processing

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Building a lexical domain map from text corpora

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Document representation in natural language text retrieval

HLT '94 Proceedings of the workshop on Human Language Technology
Natural language information retrieval: TIPSTER-2 final report

TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
Semantics Recognition in Service Composition Using Conceptual Graph

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
A syntactically-based query reformulation technique for information retrieval

Information Processing and Management: an International Journal
Taming wild phrases

ECIR'03 Proceedings of the 25th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe TTP, a fast and robust natural language parser which can analyze written text and generate regularized parse structures for sentences and phrases at the speed of approximately 0.5 sec/sentence, or 44 word per second. The parser is based on a wide coverage grammar for English, developed by the New York University's Linguistic String Project, and it uses the machine-readable version of the Oxford Advanced Learner's Dictionary as a source of its basic vocabulary. The parser operates on stochastically tagged text, and contains a powerful skip-and-fit recovery mechanism that allows it to deal with extra-grammatical input and to operate effectively under a severe time pressure. Empirical experiments, testing parser's speed and accuracy, were performed on several collections: a collection of technical abstracts (CACM-3204), a corpus of news messages (MUC-3), a selection from ACM Computer Library database, and a collection of Wall Street Journal articles, approximately 50 million words in total.