CoBWeb A Crawler for the Brazilian Web
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Hi-index | 0.00 |
This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a flexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages.