A maximum entropy approach to natural language processing
Computational Linguistics
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging for morphologically complex languages: application to Bulgarian
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
We describe our efforts in adapting five basic natural language processing components to Bulgar-ian: sentence splitter, tokenizer, part-of-speech tagger, chunker, and syntactic parser. The components were originally developed for English within OpenNLP, an open source maximum entropy based machine learning toolkit, and were retrained based on manually annotated training data from the BulTreeBank. The evaluation results show an F1 score of 92.54% for the sentence splitter, 98.49% for the tokenizer, 94.43% for the part-of-speech tagger, 84.60% for the chunker, and 77.56% for the syntactic parser, which should be interpreted as baseline for Bulgarian.