Value for money: balancing annotation effort, lexicon building and accuracy for multilingual WSD

  • Authors:
  • Mitesh M. Khapra;Saurabh Sohoney;Anup Kulkarni;Pushpak Bhattacharyya

  • Affiliations:
  • Indian Institute of Technology Bombay;Indian Institute of Technology Bombay;Indian Institute of Technology Bombay;Indian Institute of Technology Bombay

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sense annotation and lexicon building are costly affairs demanding prudent investment of resources. Recent work on multilingual WSD has shown that it is possible to leverage the annotation work done for WSD of one language (SL) for another (TL), by projecting Wordnet and sense marked corpus parameters of SL to TL. However, this work does not take into account the cost of manually cross-linking the words within aligned synsets. Further, it does not answer the question of "Can better accuracy be achieved if a user is willing to pay additional money?" We propose a measure for cost-benefit analysis which measures the "value for money" earned in terms of accuracy by investing in annotation effort and lexicon building. Two key ideas explored in this paper are (i) the use of probabilistic cross-linking model to reduce manual cross-linking effort and (ii) the use of selective sampling to inject a few training examples for hard-to-disambiguate words from the target language to boost the accuracy.