Bilingual chunk alignment based on interactional matching and probabilistic latent semantic indexing

Authors:
Feifan Liu;Qianli Jin;Jun Zhao;Bo Xu
Affiliations:
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 10
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Computing the SVD of a General Matrix Product/Quotient

SIAM Journal on Matrix Analysis and Applications
Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

EPIA '99 Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Latent dirichlet allocation

The Journal of Machine Learning Research
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Finding structural correspondences from bilingual parsed corpus for corpus-based translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Structure alignment using bilingual chunking

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Word translation disambiguation using Bilingual Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Word alignment of English-Chinese bilingual corpus based on chunks

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

Quantified Score

Hi-index	0.00

Visualization

Abstract

An integrated method for bilingual chunk partition andalignment, called “Interactional Matching”, is proposed in this paper. Different from former works, our method tries to get as necessary information as possible from the bilingual corpora themselves, and through bilingual constraint it can automatically build one-to-one chunk-pairs associated with the chunk-pair confidence coefficients. Also, our method partitions bilingual sentences entirely into chunks with no fragments left, different from collocation extracting methods. Furthermore, with the technology of Probabilistic Latent Semantic Indexing(PLSI), this method can deal with not only compositional chunks, but also non-compositional ones. The experiments show that, for overall process (including partition and alignment), our method can obtain 85% precision with 57% recall for the written language chunk-pairs and 78% precision with 53% recall for the spoken language chunk-pairs.