Cross-lingual adaptation as a baseline: adapting maximum entropy models to Bulgarian

  • Authors:
  • Georgi Georgiev;Preslav Nakov;Petya Osenova;Kiril Simov

  • Affiliations:
  • Ontotext AD, Sofia, Bulgaria;National University of Singapore, Singapore;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria

  • Venue:
  • AdaptLRTtoND '09 Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe our efforts in adapting five basic natural language processing components to Bulgar-ian: sentence splitter, tokenizer, part-of-speech tagger, chunker, and syntactic parser. The components were originally developed for English within OpenNLP, an open source maximum entropy based machine learning toolkit, and were retrained based on manually annotated training data from the BulTreeBank. The evaluation results show an F1 score of 92.54% for the sentence splitter, 98.49% for the tokenizer, 94.43% for the part-of-speech tagger, 84.60% for the chunker, and 77.56% for the syntactic parser, which should be interpreted as baseline for Bulgarian.