World Wide Web - A Multilingual Language Resource

Authors:
Fang Li;Huanye Sheng;Wilhelm Weisweber
Affiliations:
-;-;-
Venue:
WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Year:
2001

Citing 2
Cited 0

Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper argues that the World Wide Web could be regarded not only as an information resource but also as a dynamic, multilingual, least controlled, easy to access and untagged language corpus. In order to support this idea, we realized a method, which is able to extract bilingual lexicons from parallel WWW pages by two-stage alignment. Language pairs of German, English and Chinese have been selected but the realization is independent of any natural language, domain or markup.