Combining sentence length with location information to align monolingual parallel texts

Authors:
Weigang Li;Ting Liu;Sheng Li
Affiliations:
Harbin Institute of Technology, Information Retrieval Laboratory, School of Computer Science and Technology, Harbin, P.R. China;Harbin Institute of Technology, Information Retrieval Laboratory, School of Computer Science and Technology, Harbin, P.R. China;Harbin Institute of Technology, Information Retrieval Laboratory, School of Computer Science and Technology, Harbin, P.R. China
Venue:
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Year:
2004

Citing 13
Cited 0

A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Discovery of inference rules for question-answering

Natural Language Engineering
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Syntagmatic and paradigmatic representations of term variation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Exploiting paraphrases in a Question Answering system

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Interrogative reformulation patterns and acquisition of question paraphrases

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Paraphrase acquisition for information extraction

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Paraphrasing rules for automatic evaluation of translation into Japanese

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Inferring strategies for sentence ordering in multidocument news summarization

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abundant Chinese paraphrasing resource on Internet can be attained from different Chinese translations of one foreign masterpiece. Paraphrases corpus is the corpus that includes sentence pairs to convey the same information. The irregular characteristics of the real monolingual parallel texts, especially without the strictly aligned paragraph boundaries between two translations, bring a challenge to alignment technology. The traditional alignment methods on bilingual texts have some difficulties in competency for doing this. A new method for aligning real monolingual parallel texts using sentence pair's length and location information is described in this paper. The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly. And presently, a paraphrases corpus with about fifty thousand sentence pairs is constructed.