Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Neoclassical compound alignments from comparable corpora
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hi-index | 0.00 |
The use of the World Wide Web as a free source for large linguistic resources is a well-established idea. Such resources are keystones to domains such as lexicon-based categorization, information retrieval, machine translation and information extraction. In this paper, we present an industrial focused web crawler for the automatic compilation of specialized corpora from the web. This application, created within the framework of the TTC project, is used daily by several linguists to bootstrap large thematic corpora which are then used to automatically generate bilingual terminologies.