Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Adaptive information agents in distributed textual environments
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving automatic query expansion
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Measuring index quality using random walks on the Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Adding support for dynamic and focused search with Fetuccino
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Link-based and content-based evidential information in a belief network model
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Does “authority” mean quality? predicting expert quality ratings of Web documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Searching the Web: the public and their queries
Journal of the American Society for Information Science and Technology
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Information Retrieval
MySpiders: Evolve Your Own Intelligent Web Crawlers
Autonomous Agents and Multi-Agent Systems
ARCCHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Complementing search engines with online web mining agents
Decision Support Systems - Special issue: Web data mining
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Small world peer networks in distributed web search
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
Probabilistic models for focused web crawling
Proceedings of the 6th annual ACM international workshop on Web information and data management
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Generalizing PageRank: damping functions for link-based ranking algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Interest-based personalized search
ACM Transactions on Information Systems (TOIS)
Using HMM to learn user browsing patterns for focused web crawling
Data & Knowledge Engineering - Special issue: WIDM 2004
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
The impact of term selection in genre-aware focused crawling
Proceedings of the 2008 ACM symposium on Applied computing
SVM based adaptive learning method for text classification from positive and unlabeled documents
Knowledge and Information Systems
Exploiting Multiple Features with MEMMs for Focused Web Crawling
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES
Applied Artificial Intelligence
A three-year study on the freshness of web search engine databases
Journal of Information Science
A cross-language focused crawling algorithm based on multiple relevance prediction strategies
Computers & Mathematics with Applications
Reinforcement Learning with Classifier Selection for Focused Crawling
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
A Genre-Aware Approach to Focused Crawling
World Wide Web
Adaptive geospatially focused crawling
Proceedings of the 18th ACM conference on Information and knowledge management
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
Towards a graph-based user profile modeling for a session-based personalized search
Knowledge and Information Systems
The adaptive web
Addressing the limited scope problem of focused crawling using a result merging approach
Proceedings of the 2010 ACM Symposium on Applied Computing
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Where to crawl next for focused crawlers
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Turn the page: automated traversal of paginated websites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers
ACM Transactions on Information Systems (TOIS)
Domain specific search in indian languages
Proceedings of the first workshop on Information and knowledge management for developing region
Hi-index | 0.00 |
Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. As the Web mining field matures, the disparate crawling strategies proposed in the literature will have to be evaluated and compared on common tasks through well-defined performance measures. This paper presents a general framework to evaluate topical crawlers. We identify a class of tasks that model crawling applications of different nature and difficulty. We then introduce a set of performance measures for fair comparative evaluations of crawlers along several dimensions including generalized notions of precision, recall, and efficiency that are appropriate and practical for the Web. The framework relies on independent relevance judgements compiled by human editors and available from public directories. Two sources of evidence are proposed to assess crawled pages, capturing different relevance criteria. Finally we introduce a set of topic characterizations to analyze the variability in crawling effectiveness across topics. The proposed evaluation framework synthesizes a number of methodologies in the topical crawlers literature and many lessons learned from several studies conducted by our group. The general framework is described in detail and then illustrated in practice by a case study that evaluates four public crawling algorithms. We found that the proposed framework is effective at evaluating, comparing, differentiating and interpreting the performance of the four crawlers. For example, we found the IS crawler to be most sensitive to the popularity of topics.