CFILT: Resource conscious approaches for all-words domain specific WSD

  • Authors:
  • Anup Kulkarni;Mitesh M. Khapra;Saurabh Sohoney;Pushpak Bhattacharyya

  • Affiliations:
  • Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India

  • Venue:
  • SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe two approaches for All-words Word Sense Disambiguation on a Specific Domain. The first approach is a knowledge based approach which extracts domain-specific largest connected components from the Wordnet graph by exploiting the semantic relations between all candidate synsets appearing in a domain-specific untagged corpus. Given a test word, disambiguation is performed by considering only those candidate synsets that belong to the top-k largest connected components. The second approach is a weakly supervised approach which relies on the "One Sense Per Domain" heuristic and uses a few hand labeled examples for the most frequently appearing words in the target domain. Once the most frequent words have been disambiguated they can provide strong clues for disambiguating other words in the sentence using an iterative disambiguation algorithm. Our weakly supervised system gave the best performance across all systems that participated in the task even when it used as few as 100 hand labeled examples from the target domain.