Scaled log likelihood ratios for the detection of abbreviations in text corpora

  • Authors:
  • Tibor Kiss;Jan Strunk

  • Affiliations:
  • Ruhr-Universität Bochum, Bochum;Ruhr-Universität Bochum, Bochum

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a language-independent, flexible, and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as a collocation, and can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a good recall, its precision is poor. We employ scaling factors which lead to a strong improvement of precision. Experiments with English and German corpora show that abbreviations can be detected with high accuracy.