A maximum entropy approach to natural language processing
Computational Linguistics
SaRAD: a Simple and Robust Abbreviation Dictionary
Bioinformatics
Evaluation and extension of maximum entropy models with inequality constraints
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Journal of Computer Science and Technology
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
This paper presents a machine learning approach to acronym generation. We formalize the generation process as a sequence labeling problem on the letters in the definition (expanded form) so that a variety of Markov modeling approaches can be applied to this task. To construct the data for training and testing, we extracted acronym-definition pairs from MEDLINE abstracts and manually annotated each pair with positional information about the letters in the acronym. We have built an MEMM-based tagger using this training data set and evaluated the performance of acronym generation. Experimental results show that our machine learning method gives significantly better performance than that achieved by the standard heuristic rule for acronym generation and enables us to obtain multiple candidate acronyms together with their likelihoods represented in probability values.