Automatic sense disambiguation for acronyms
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison study of biomedical short form definition detection algorithms
TMBIO '06 Proceedings of the 1st international workshop on Text mining in bioinformatics
Chinese abbreviation-definition identification: a SVM approach using context information
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
A supervised learning approach to acronym identification
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Acronyms are a significant and the most dynamic area of the lexicon of many languages. Building automated acronym systems poses two problems: acquisition and disambiguation. Acronym acquisition is based on the identification of anaphoric or cataphoric expressions which introduce the meaning of an acronym in text; acronym disambiguation is a word sense disambiguation task, with expansions of an acronym being its possible senses. It is proposed here that acronyms are universal phenomena, occurring in all languages with a written form, and that their formation is governed by linguistic preferences, based on regularities at the character, phoneme, word and phrase levels. A universal explanatory theory of acronyms is presented, which rests on a set of testable hypotheses, and is manifested through a set of violable, ordered rules. The theory is developed based on examples from fifteen languages, with six different writing systems. A dynamic programming algorithm is implemented based on the explanatory theory of acronyms. The algorithm is evaluated on lists of acronyms-expansion pairs in Russian Spanish, Danish, German, English, French, Italian, Dutch, Portuguese, Finnish, and Swedish and achieves excellent performance. A two-pass greedy algorithm for automatic acronym acquisition is designed, which results in good performance for specific domains. A hybrid, machine learning algorithm—using features generated through dynamic programming acronym-expansion matching—is proposed and results in good performance on noisy, parsed, newspaper text. A machine learning algorithm for acronym sense disambiguation is presented, which is trained and evaluated automatically on information downloaded following search engine lookup. The algorithm achieves good performance on deciding whether an acronym occurs with a certain sense in a given context, and good accuracy when picking the correct sense for an acronym in a given context. All algorithms presented allow for efficient, readily usable implementations that can be included as components in larger natural language frameworks. Technologies developed have applicability beyond acronym acquisition and disambiguation, to aspects of the more general problems of anaphora resolution and word sense disambiguation, within information extraction or natural language understanding systems.