Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Word sense disambiguation using Conceptual Density
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Word translation disambiguation using bilingual bootstrapping
Computational Linguistics
Finding predominant word senses in untagged text
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Projecting parameters for multilingual word sense disambiguation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Value for money: balancing annotation effort, lexicon building and accuracy for multilingual WSD
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Multilingual WSD with just a few lines of code: the BabelNet API
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Joining forces pays off: multilingual joint word sense disambiguation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
BiCWS: mining cognitive differences from bilingual web search results
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
Recent work on bilingual Word Sense Disambiguation (WSD) has shown that a resource deprived language (L1) can benefit from the annotation work done in a resource rich language (L2) via parameter projection. However, this method assumes the presence of sufficient annotated data in one resource rich language which may not always be possible. Instead, we focus on the situation where there are two resource deprived languages, both having a very small amount of seed annotated data and a large amount of untagged data. We then use bilingual bootstrapping, wherein, a model trained using the seed annotated data of L1 is used to annotate the untagged data of L2 and vice versa using parameter projection. The untagged instances of L1 and L2 which get annotated with high confidence are then added to the seed data of the respective languages and the above process is repeated. Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost.