Comparable fora

Authors:
Johanka Spoustová;Miroslav Spousta
Affiliations:
Charles University Prague, Czech Republic;Charles University Prague, Czech Republic
Venue:
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Year:
2011

Citing 6
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An information extraction engine for web discussion forums

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Directions for exploiting asymmetries in multilingual Wikipedia

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Automatic Data Extraction from Web Discussion Forums

FCST '09 Proceedings of the 2009 Fourth International Conference on Frontier of Computer Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the title suggests, our paper deals with web discussion fora, whose content can be considered to be a special type of comparable corpora. We discuss the potential of this vast amount of data available now on the World Wide Web nearly for every language, regarding both general and common topics as well as the most obscure and specific ones. To illustrate our ideas, we propose a case study of seven wedding discussion fora in five languages.