Vocabulary expansion through automatic abbreviation generation for Chinese voice search

Authors:
Dong Yang;Yi-Cheng Pan;Sadaoki Furui
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, 2-12-1-W8-E601, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1-W8-E601, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1-W8-E601, Ookayama, Meguro-ku, Tokyo 152-8552, Japan
Venue:
Computer Speech and Language
Year:
2012

Citing 6
Cited 1

Query expansion using heterogeneous thesauri

Information Processing and Management: an International Journal
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A scalable method for voice search to nationwide business listings

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Automatic Chinese abbreviation generation using conditional random field

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Automatic expansion of abbreviations in chinese news text

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory

Chinese-English mixed text normalization

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Long organization names are often abbreviated in spoken Chinese, and abbreviated utterances cannot be recognized correctly if the abbreviations are not included in the recognition vocabulary. Therefore, it is very important to automatically generate and add abbreviations for organization names to the vocabulary. Generation of Chinese abbreviations is much more complex than English abbreviations which are mostly acronyms and truncations. In this paper, we propose a new hybrid method for automatically generating Chinese abbreviations and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we treat the abbreviation generation problem as a tagging problem and use conditional random fields (CRF) as the tagging tool, the output of which is then re-ranked by a length model and web information. In the vocabulary expansion, considering the multiple abbreviation phenomenon and limited coverage of the top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved a top-10 coverage of 88.3% with the proposed method. For the voice search using abbreviated utterances, we improved the full-name search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to the vocabulary.