LCC-WSD: system description for English coarse grained all words task at SemEval 2007

  • Authors:
  • Adrian Novischi;Munirathnam Srikanth;Andrew Bennett

  • Affiliations:
  • Language Computer Corp., Richardson, TX;Language Computer Corp., Richardson, TX;Language Computer Corp., Richardson, TX

  • Venue:
  • SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This document describes the Word Sense Disambiguation system used by Language Computer Corporation at English Coarse Grained All Word Task at SemEval 2007. The system is based on two supervised machine learning algorithms: Maximum Entropy and Support Vector Machines. These algorithms were trained on a corpus created from Sem-Cor, Senseval 2 and 3 all words and lexical sample corpora and Open Mind Word Expert 1.0 corpus. We used topical, syntactic and semantic features. Some semantic features were created using WordNet glosses with semantic relations tagged manually and automatically as part of eXtended WordNet project. We also tried to create more training instances from the disambiguated WordNet glosses found in XWN project (XWN, 2003). For words for which we could not build a sense classifier, we used First Sense in WordNet as a back-off strategy in order to have coverage of 100%. The precision and recall of the overall system is 81.446% placing it in the top 5 systems.