Increase the efficiency of English-Chinese sentence alignment: target range restriction and empirical selection of stop words

Authors:
Wing-Kwong Wong;Hsi-Hsun Yang;Wei-Lung Shen;Sheng-Kai Yin;Sheng-Cheng Hsu
Affiliations:
Department of Electronic Engineering, National Yunlin University of Science & Technology, Douliou, Yunlin, Taiwan, R.O.C.;Graduate School of Engineering Science & Technology, National Yunlin University of Science & Technology, Douliou, Yunlin, Taiwan, R.O.C.;Institute of Computer Science and Information Engineering, National Yunlin University of Science & Technology, Douliou, Yunlin, Taiwan, R.O.C.;Graduate School of Engineering Science & Technology, National Yunlin University of Science & Technology, Douliou, Yunlin, Taiwan, R.O.C.;Department of Media and Design, Asia University, Wufeng, Taichung, Taiwan, R.O.C.
Venue:
WSEAS Transactions on Computers
Year:
2008

Citing 9
Cited 0

Building a Chinese-English wordnet for translingual applications

ACM Transactions on Asian Language Information Processing (TALIP)
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
English-Chinese machine translation system IMT/EC

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Subsentential translation memory for computer assisted writing and translation

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
How to improve the accuracy of super-function based Chinese-Japanese causative sentence machine translation

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Rule-based translation of quantifiers for Chinese-Japanese machine translation

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we use a lexical method to do sentence alignment for an English-Chinese corpus. Past research shows that alignment using a dictionary involves a lot of word matching and dictionary look ups. To address these two issues, we first restrict the range of candidate target sentences, based on the location of the source sentence relative to the beginning of the text. Moreover, careful empirical selection of stop words, based on word frequencies in the source text, helps to reduce the number of dictionary look ups. Experimental results show that the amount of word matching can be cut down by 75% and that of dictionary look ups by as much as 43% without sacrificing precision and recall. Another experiment was also done with twenty New York Times articles with 598 sentences and 18395 words. The resulted precision is 95.6% and the recall is 93.8%. Among all predicted alignment, 86% of the alignment is 1:1 (one source sentence to one target sentence), 8% is 1:2, and 6% is 2:1. Further analysis shows that most errors occur in alignments of types 1:2 and 2:1. Future work should focus on problems with these two alignment types.