A robust parser based on syntactic information
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Recognizing syntactic errors in the writing of second language learners
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Treebanks gone bad: Parser evaluation and retraining using a treebank of ungrammatical sentences
International Journal on Document Analysis and Recognition
The effect of correcting grammatical errors on parse probabilities
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Syntax-driven machine translation as a model of ESL revision
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Experiments with artificially generated noise for cleansing noisy text
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Data mining from a patient safety database: the lessons learned
Data Mining and Knowledge Discovery
Exploiting parse structures for native language identification
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Large-scale syntactic language modeling with treelets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Hi-index | 0.00 |
We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they contain one or more syntactic errors. We evaluate an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical errors. We re-train this parser on the training section of the ungrammatical treebank, leading to an significantly improved performance on the ungrammatical test sets. We show how a classifier can be used to prevent performance degradation on the original grammatical data.