Subject Classification in the Oxford English Dictionary

  • Authors:
  • Zarrin Langari;Frank Wm. Tompa

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The oxford English Dictionary is a valuable source of lexical information and a rich testing ground for mining highly structured text.Each entry is organized into a hierarchy of senses, which include definitions, labels and cited quotations.Subject labels distinguish the subject classification of a sense, for example they signal how a word may be used in Anthropology, Music or Computing.Unfortunately subject labeling in the dictionary is incomplete. To overcome thisincompleteness, we attempt to classify the senses (i.e., definitions) in the dictionary by their subjects, using thecitations as an information guide.We report on four different approaches: K Nearest Neighbors, a standard classification technique; Term Weighting, an information retrieval method dealing with text; Naïve Bayes, a probabilistic method; and Expectation Maximization, An iterative probabilistic method.Experimental performance of these Methods is compared based on standard classification metrics.