Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
Journal of the American Society for Information Science
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Improving Chinese tokenization with linguistic filters on statistical lexical acquisition
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Bilingual text, matching using bilingual dictionary and statistics
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Hi-index | 0.00 |
Cross-lingual semantic interoperability has drawn significant research attention recently, as the number of digital libraries in non-English languages has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish and French, has been widely explored, but CLIR across European and Oriental languages is still at the initial stages. To cross the language boundary, a corpus-based approach shows promise of overcoming the limitations of knowledge-based and controlled vocabulary approaches. However, collecting parallel corpora between European and Oriental languages is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches, and compare their performance in aligning English and Chinese titles of parallel documents available on the Web.