Link-based and content-based evidential information in a belief network model
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Local versus global link information in the Web
ACM Transactions on Information Systems (TOIS)
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Impedance coupling in content-targeted advertising
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Set-based vector model: An efficient approach for correlation-based ranking
ACM Transactions on Information Systems (TOIS)
A comparative study of citations and links in document classification
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Multi-evidence, multi-criteria, lazy associative document classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Characterization of national Web domains
ACM Transactions on Internet Technology (TOIT)
A cost-effective method for detecting web site replicas on search engine databases
Data & Knowledge Engineering
CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
BioCrawler: An intelligent crawler for the semantic web
Expert Systems with Applications: An International Journal
CUCWeb: a Catalan corpus built from the web
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Design and implement a web news retrieval system
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per time period, while observing operational and ethical limits in the crawling process. CoBWeb is part of the SIAM (Information Systems in Mobile Computing Environments) search engine which is being implemented to support the Brazilian Web. Thus, several results related to the Brazilian Web are presented.