Bootstrapping statistical parsers from small datasets
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Reranking and self-training for parser adaptation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
When is self-training effective for parsing?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The PaGe 2008 shared task on parsing German
PaGe '08 Proceedings of the Workshop on Parsing German
Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
MAP adaptation of stochastic grammars
Computer Speech and Language
Self-training PCFG grammars with latent annotations across languages
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Automatic domain adaptation for parsing
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hard constraints for grammatical function labelling
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Statistical parsing with a context-free grammar and word statistics
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Data point selection for cross-language adaptation of dependency parsers
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Hi-index | 0.00 |
Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse data problems for statistical parsing. We present a simple approach for addressing these issues. Our approach makes use of self-training on instances selected with regard to their similarity to the annotated data. Our similarity measure is based on the perplexity of part-of-speech trigrams of new instances measured against the annotated training data. Preliminary results show that our method outperforms a self-training setting where instances are simply selected by order of occurrence in the corpus and argue that self-training is a cheap and effective method for improving parsing accuracy for morphologically rich languages.