Automatic expansion of abbreviations by using context and character information

Authors:
Akira Terada;Takenobu Tokunaga;Hozumi Tanaka
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ôokayama Meguro, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ôokayama Meguro, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ôokayama Meguro, Tokyo 152-8552, Japan
Venue:
Information Processing and Management: an International Journal
Year:
2004

Citing 9
Cited 2

Automatic text processing

Automatic text processing
Spelling correction for the telecommunications network for the deaf

Communications of the ACM
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
C4.5: programs for machine learning

C4.5: programs for machine learning
Semiautomatic disabbreviation of technical text

Information Processing and Management: an International Journal
Acrophile: an automated acronym extractor and server

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Information Retrieval

Information Retrieval
Extracting Knowledge from Diagnostic Databases

IEEE Expert: Intelligent Systems and Their Applications
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics

A hybrid approach to chinese abbreviation expansion

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Automatic expansion of abbreviations in chinese news text

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.