A machine learning approach to acronym generation

Authors:
Yoshimasa Tsuruoka;Sophia Ananiadou;Jun'ichi Tsujii
Affiliations:
Japan Science and Technology Agency Japan and The University of Tokyo, Japan;Salford University, United Kingdom;The University of Tokyo, Japan and Japan Science and Technology Agency Japan
Venue:
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Year:
2005

Citing 3
Cited 3

A maximum entropy approach to natural language processing

Computational Linguistics
SaRAD: a Simple and Robust Abbreviation Dictionary

Bioinformatics
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Predicting chinese abbreviations from definitions: an empirical learning approach using support vector regression

Journal of Computer Science and Technology
Robust approach to abbreviating terms: a discriminative latent variable model with global information

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a machine learning approach to acronym generation. We formalize the generation process as a sequence labeling problem on the letters in the definition (expanded form) so that a variety of Markov modeling approaches can be applied to this task. To construct the data for training and testing, we extracted acronym-definition pairs from MEDLINE abstracts and manually annotated each pair with positional information about the letters in the acronym. We have built an MEMM-based tagger using this training data set and evaluated the performance of acronym generation. Experimental results show that our machine learning method gives significantly better performance than that achieved by the standard heuristic rule for acronym generation and enables us to obtain multiple candidate acronyms together with their likelihoods represented in probability values.