Automatic expansion of abbreviations in chinese news text

  • Authors:
  • Guohong Fu;Kang-Kwong Luke;GuoDong Zhou;Ruifeng Xu

  • Affiliations:
  • Department of Linguistics, The University of Hong Kong, Hong Kong;Department of Linguistics, The University of Hong Kong, Hong Kong;School of Computer Science and Technology, Suzhou University, China;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong

  • Venue:
  • AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an n-gram based approach to Chinese abbreviation expansion. In this study, we distinguish reduced abbreviations from non-reduced abbreviations that are created by elimination or generalization. For a reduced abbreviation, a mapping table is compiled to map each short-word in it to a set of long-words, and a bigram based Viterbi algorithm is thus applied to decode an appropriate combination of long-words as its full-form. For a non-reduced abbreviation, a dictionary of non-reduced abbreviation/full-form pairs is used to generate its expansion candidates, and a disambiguation technique is further employed to select a proper expansion based on bigram word segmentation. The evaluation on an abbreviation-expanded corpus built from the PKU corpus showed that the proposed system achieved a recall of 82.9% and a precision of 85.5% on average for different types of abbreviations in Chinese news text.