Linguistic preprocessing for distributional classification of words

  • Authors:
  • Viktor Pekar

  • Affiliations:
  • University of Wolverhampton, MB, Wolverhampton, UK

  • Venue:
  • ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper is concerned with automatic classification of new lexical items into synonymic sets on the basis of their cooccurrence data obtained from a corpus. Our goal is to examine the impact that different types of linguistic preprocessing of the cooccurrence material have on the classification accuracy. The paper comparatively studies several preprocessing techniques frequently used for this and similar tasks and makes conclusions about their relative merits. We find that a carefully chosen preprocessing procedure achieves a relative effectiveness improvement of up to 88% depending on the classification method in comparison to the window-based context delineation, along with using much smaller feature space.