Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Hi-index | 0.00 |
This paper argues that the World Wide Web could be regarded not only as an information resource but also as a dynamic, multilingual, least controlled, easy to access and untagged language corpus. In order to support this idea, we realized a method, which is able to extract bilingual lexicons from parallel WWW pages by two-stage alignment. Language pairs of German, English and Chinese have been selected but the realization is independent of any natural language, domain or markup.