Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Exploiting auxiliary distributions in stochastic unification-based grammars
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Evaluation and extension of maximum entropy models with inequality constraints
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Developing a robust part-of-speech tagger for biomedical text
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A token centric part-of-speech tagger for biomedical text
AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
Hi-index | 0.00 |
For the domain of biomedical research abstracts, two large corpora, namely GENIA (Kim et al 2003) and Penn BioIE (Kulik et al 2004) are available. Both are basically in human domain and the performance of systems trained on these corpora when they are applied to abstracts dealing with other species is unknown. In machine-learning-based systems, re-training the model with addition of corpora in the target domain has achieved promising results (e.g. Tsuruoka et al 2005, Lease et al 2005). In this paper, we compare two methods for adaptation of POS taggers trained for GENIA and Penn BioIE corpora to Drosophila melanogaster (fruit fly) domain.