Automatic expansion of abbreviations by using context and character information

  • Authors:
  • Akira Terada;Takenobu Tokunaga;Hozumi Tanaka

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ôokayama Meguro, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ôokayama Meguro, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ôokayama Meguro, Tokyo 152-8552, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.