Adaptive signal processing
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Measuring index quality using random walks on the Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Adding support for dynamic and focused search with Fetuccino
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Link-based and content-based evidential information in a belief network model
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Does “authority” mean quality? predicting expert quality ratings of Web documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Information Retrieval
EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS
EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Topic-oriented collaborative crawling
Proceedings of the eleventh international conference on Information and knowledge management
MySpiders: Evolve Your Own Intelligent Web Crawlers
Autonomous Agents and Multi-Agent Systems
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Agents, Crawlers, and Web Retrieval
CIA '02 Proceedings of the 6th International Workshop on Cooperative Information Agents VI
Multiple-goal search algorithms and their application to web crawling
Eighteenth national conference on Artificial intelligence
Complementing search engines with online web mining agents
Decision Support Systems - Special issue: Web data mining
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
Panorama: extending digital libraries with topical crawlers
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Focused Crawling by Learning HMM from User's Topic-specific Browsing
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
Probabilistic models for focused web crawling
Proceedings of the 6th annual ACM international workshop on Web information and data management
ELA—A new Approach for Learning Agents
Autonomous Agents and Multi-Agent Systems
Learnable topic-specific web crawler
Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
What's there and what's not?: focused crawling for missing documents in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Quality and relevance of domain-specific search: A case study in mental health
Information Retrieval
Geographically focused collaborative crawling
Proceedings of the 15th international conference on World Wide Web
Web dynamics and their ramifications for the development of web search engines
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Focused crawling guided by link context
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
ACM Transactions on Internet Technology (TOIT)
Using HMM to learn user browsing patterns for focused web crawling
Data & Knowledge Engineering - Special issue: WIDM 2004
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
The impact of term selection in genre-aware focused crawling
Proceedings of the 2008 ACM symposium on Applied computing
BioCrawler: An intelligent crawler for the semantic web
Expert Systems with Applications: An International Journal
Exploring traversal strategy for web forum crawling
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Multiple Features with MEMMs for Focused Web Crawling
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES
Applied Artificial Intelligence
A cross-language focused crawling algorithm based on multiple relevance prediction strategies
Computers & Mathematics with Applications
Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Genre-Aware Approach to Focused Crawling
World Wide Web
Multiple-goal heuristic search
Journal of Artificial Intelligence Research
Proceedings of the VLDB Endowment
Application of rough ensemble classifier to web services categorization and focused crawling
Web Intelligence and Agent Systems
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Automatically constructing a directory of molecular biology databases
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
A domain-based intelligent search engine
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Fixing the threshold for effective detection of near duplicate web documents in web crawling
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Design and implementation of contextual information portals
Proceedings of the 20th international conference companion on World wide web
Combining text and link analysis for focused crawling
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Automatic generation and use of negative terms to evaluate topic-related web pages
HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
Focused crawling using latent semantic indexing – an application for vertical search engines
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Domain specific search in indian languages
Proceedings of the first workshop on Information and knowledge management for developing region
A classification framework for web robots
Journal of the American Society for Information Science and Technology
Topical crawling on the web through local site-searches
Journal of Web Engineering
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
Hi-index | 0.00 |
Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies to prioritize the pages to be indexed. The issue is even more important for topic-specific search engines, where crawlers must make additional decisions based on the relevance of visited pages. However, it is difficult to evaluate alternative crawling strategies because relevant sets are unknown and the search space is changing. We propose three different methods to evaluate crawling strategies. We apply the proposed metrics to compare three topic-driven crawling algorithms based on similarity ranking, link analysis, and adaptive agents.