Using SVM to Extract Acronyms from Text

Authors:
Jun Xu;Yalou Huang
Affiliations:
College of Software, Nankai University, No. 94 Weijin Road, 300071, Tianjin, China;College of Software, Nankai University, No. 94 Weijin Road, 300071, Tianjin, China
Venue:
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Year:
2006

Citing 0
Cited 6

Mining, ranking, and using acronym patterns

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Protein interaction detection in sentences via Gaussian Processes: a preliminary evaluation

International Journal of Data Mining and Bioinformatics
ICE-TEA: in-context expansion and translation of English abbreviations

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Learning conditional random fields with latent sparse features for acronym expansion finding

Proceedings of the 20th ACM international conference on Information and knowledge management
High-recall extraction of acronym-definition pairs with relevance feedback

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper addresses the problem of extracting acronyms and their expansions from text. We propose a support vector machines (SVM) based approach to deal with the problem. First, all likely acronyms are identified using heuristic rules. Second, expansion candidates are generated from surrounding text of acronyms. Last, SVM model is employed to select the genuine expansions. Analysis shows that the proposed approach has the advantages of saving over the conventional rule based approaches. Experimental results show that our approach outperforms the baseline method of using rules. We also show that the trained SVM model is generic and can adapt to other domains easily.