Vocabulary expansion through automatic abbreviation generation for Chinese voice search

  • Authors:
  • Dong Yang;Yi-Cheng Pan;Sadaoki Furui

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology, 2-12-1-W8-E601, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1-W8-E601, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1-W8-E601, Ookayama, Meguro-ku, Tokyo 152-8552, Japan

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Long organization names are often abbreviated in spoken Chinese, and abbreviated utterances cannot be recognized correctly if the abbreviations are not included in the recognition vocabulary. Therefore, it is very important to automatically generate and add abbreviations for organization names to the vocabulary. Generation of Chinese abbreviations is much more complex than English abbreviations which are mostly acronyms and truncations. In this paper, we propose a new hybrid method for automatically generating Chinese abbreviations and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we treat the abbreviation generation problem as a tagging problem and use conditional random fields (CRF) as the tagging tool, the output of which is then re-ranked by a length model and web information. In the vocabulary expansion, considering the multiple abbreviation phenomenon and limited coverage of the top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved a top-10 coverage of 88.3% with the proposed method. For the voice search using abbreviated utterances, we improved the full-name search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to the vocabulary.