Query expansion using heterogeneous thesauri
Information Processing and Management: an International Journal
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A scalable method for voice search to nationwide business listings
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Automatic Chinese abbreviation generation using conditional random field
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Automatic expansion of abbreviations in chinese news text
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
IEEE Transactions on Information Theory
Chinese-English mixed text normalization
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Long organization names are often abbreviated in spoken Chinese, and abbreviated utterances cannot be recognized correctly if the abbreviations are not included in the recognition vocabulary. Therefore, it is very important to automatically generate and add abbreviations for organization names to the vocabulary. Generation of Chinese abbreviations is much more complex than English abbreviations which are mostly acronyms and truncations. In this paper, we propose a new hybrid method for automatically generating Chinese abbreviations and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we treat the abbreviation generation problem as a tagging problem and use conditional random fields (CRF) as the tagging tool, the output of which is then re-ranked by a length model and web information. In the vocabulary expansion, considering the multiple abbreviation phenomenon and limited coverage of the top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved a top-10 coverage of 88.3% with the proposed method. For the voice search using abbreviated utterances, we improved the full-name search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to the vocabulary.