Adapting a WSJ-trained parser to grammatically noisy text

Authors:
Jennifer Foster;Joachim Wagner;Josef van Genabith
Affiliations:
Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland
Venue:
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Year:
2008

Citing 5
Cited 8

A robust parser based on syntactic information

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Recognizing syntactic errors in the writing of second language learners

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Treebanks gone bad: Parser evaluation and retraining using a treebank of ungrammatical sentences

International Journal on Document Analysis and Recognition

The effect of correcting grammatical errors on parse probabilities

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Syntax-driven machine translation as a model of ESL revision

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Experiments with artificially generated noise for cleansing noisy text

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Data mining from a patient safety database: the lessons learned

Data Mining and Knowledge Discovery
Exploiting parse structures for native language identification

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Large-scale syntactic language modeling with treelets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Using parallel features in parsing of machine-translated sentences for correction of grammatical errors

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they contain one or more syntactic errors. We evaluate an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical errors. We re-train this parser on the training section of the ungrammatical treebank, leading to an significantly improved performance on the ungrammatical test sets. We show how a classifier can be used to prevent performance degradation on the original grammatical data.