Automatic word sense disambiguation and construction identification based on corpus multilevel annotation

  • Authors:
  • Olga Lyashevskaya;Olga Mitrofanova;Maria Grachkova;Sergey Romanov;Anastasia Shimorina;Alexandra Shurygina

  • Affiliations:
  • NRU Higher School of Economics, Moscow;St. Petersburg State University, St. Petersburg, Russia;St. Petersburg State University, St. Petersburg, Russia;St. Petersburg State University, St. Petersburg, Russia;St. Petersburg State University, St. Petersburg, Russia;St. Petersburg State University, St. Petersburg, Russia

  • Venue:
  • TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The research project reported in this paper aims at automatic extraction of linguistic information from contexts in the Russian National Corpus (RNC) and its subsequent use in building a comprehensive lexicographic resource - the Index of Russian lexical constructions. The proposed approach implies automatic context classification intended for word sense disambiguation (WSD) and construction identification (CxI). The automatic context processing procedure takes into account the following types of contextual information represented in the RNC multilevel annotation: lexical (lemma) tags (lex), morphological (grammatical) tags (gr), semantic (taxonomy) tags (sem), and combinations of the various types of tags. Multiple experiments on WSD and CxI are performed using RNC representative context samples for nouns. In each series of experiments we analyze (1) different context markers of meaning of target words and (2) constructions including context markers and target words.