Pretreatment for speech machine translation

Authors:
Xiaofei Zhang;Chong Feng;Heyan Huang
Affiliations:
Research Center of Computer and Language Information Engineering, Chinese Academy of Sciences, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Venue:
ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
Year:
2010

Citing 6
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative word alignment with conditional random fields

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A phonetic-based approach to Chinese chat text normalization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Speech recognition using augmented conditional random fields

IEEE Transactions on Audio, Speech, and Language Processing
The Application of CRFs in Part-of-Speech Tagging

IHMSC '09 Proceedings of the 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.