A (acronyms)

  • Authors:
  • Manuel Zahariev

  • Affiliations:
  • Simon Fraser University (Canada)

  • Venue:
  • A (acronyms)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Acronyms are a significant and the most dynamic area of the lexicon of many languages. Building automated acronym systems poses two problems: acquisition and disambiguation. Acronym acquisition is based on the identification of anaphoric or cataphoric expressions which introduce the meaning of an acronym in text; acronym disambiguation is a word sense disambiguation task, with expansions of an acronym being its possible senses. It is proposed here that acronyms are universal phenomena, occurring in all languages with a written form, and that their formation is governed by linguistic preferences, based on regularities at the character, phoneme, word and phrase levels. A universal explanatory theory of acronyms is presented, which rests on a set of testable hypotheses, and is manifested through a set of violable, ordered rules. The theory is developed based on examples from fifteen languages, with six different writing systems. A dynamic programming algorithm is implemented based on the explanatory theory of acronyms. The algorithm is evaluated on lists of acronyms-expansion pairs in Russian Spanish, Danish, German, English, French, Italian, Dutch, Portuguese, Finnish, and Swedish and achieves excellent performance. A two-pass greedy algorithm for automatic acronym acquisition is designed, which results in good performance for specific domains. A hybrid, machine learning algorithm—using features generated through dynamic programming acronym-expansion matching—is proposed and results in good performance on noisy, parsed, newspaper text. A machine learning algorithm for acronym sense disambiguation is presented, which is trained and evaluated automatically on information downloaded following search engine lookup. The algorithm achieves good performance on deciding whether an acronym occurs with a certain sense in a given context, and good accuracy when picking the correct sense for an acronym in a given context. All algorithms presented allow for efficient, readily usable implementations that can be included as components in larger natural language frameworks. Technologies developed have applicability beyond acronym acquisition and disambiguation, to aspects of the more general problems of anaphora resolution and word sense disambiguation, within information extraction or natural language understanding systems.