Multilayer feedforward networks are universal approximators
Neural Networks
An introduction to computing with neural nets
Artificial neural networks: theoretical concepts
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The basic ideas in neural networks
Communications of the ACM
Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
The nature of statistical learning theory
The nature of statistical learning theory
Artificial Neural Networks: A Tutorial
Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
WebCutter: a system for dynamic and tailorable site mapping
Selected papers from the sixth international conference on World Wide Web
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
A smart itsy bitsy spider for the web
Journal of the American Society for Information Science - Special topic issue: artificial intelligence techniques for emerging information systems applications
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Adding support for dynamic and focused search with Fetuccino
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Personalized spiders for web search and analysis
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
MySpiders: Evolve Your Own Intelligent Web Crawlers
Autonomous Agents and Multi-Agent Systems
CI Spider: a tool for competitive intelligence on the web
Decision Support Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Brief Introduction to Boosting
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
A short introduction to learning with kernels
Advanced lectures on machine learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Panorama: extending digital libraries with topical crawlers
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
A General Evaluation Framework for Topical Crawlers
Information Retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
The impact of term selection in genre-aware focused crawling
Proceedings of the 2008 ACM symposium on Applied computing
Guide focused crawler efficiently and effectively using on-line topical importance estimation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Multiple Features with MEMMs for Focused Web Crawling
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES
Applied Artificial Intelligence
Supporting the automatic construction of entity aware search engines
Proceedings of the 10th ACM workshop on Web information and data management
A three-year study on the freshness of web search engine databases
Journal of Information Science
Query parameters for harvesting digital video and associated contextual information
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Reinforcement Learning with Classifier Selection for Focused Crawling
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Profile-based focused crawling for social media-sharing websites
Journal on Image and Video Processing
Improving the performance of focused web crawlers
Data & Knowledge Engineering
A Genre-Aware Approach to Focused Crawling
World Wide Web
Automatic online news monitoring and classification for syndromic surveillance
Decision Support Systems
Exploiting Tags and Social Profiles to Improve Focused Crawling
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Adaptive geospatially focused crawling
Proceedings of the 18th ACM conference on Information and knowledge management
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
Designing the user interface and functions of a search engine development tool
Decision Support Systems
Foundations and Trends in Information Retrieval
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
The research and implementation of the deep search engine of popular science
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Information Systems Research
Where to crawl next for focused crawlers
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
A solution to the exact match on rare item searches: introducing the lost sheep algorithm
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
A constrained crawling approach and its application to a specialised search engine
International Journal of Information and Communication Technology
User browsing behavior-driven web crawling
Proceedings of the 20th ACM international conference on Information and knowledge management
Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine
Web Semantics: Science, Services and Agents on the World Wide Web
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Lexical profiling of existing web directories to support fine-grained topic-focused web crawling
IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Turn the page: automated traversal of paginated websites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers
ACM Transactions on Information Systems (TOIS)
Domain specific search in indian languages
Proceedings of the first workshop on Information and knowledge management for developing region
An analyst-adaptive approach to focused crawlers
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Topical crawling on the web through local site-searches
Journal of Web Engineering
CALA: An unsupervised URL-based web page classification system
Knowledge-Based Systems
Explaining data-driven document classifications
MIS Quarterly
Hi-index | 0.00 |
Topical crawling is a young and creative area of research that holds the promise of benefiting from several sophisticated data mining techniques. The use of classification algorithms to guide topical crawlers has been sporadically suggested in the literature. No systematic study, however, has been done on their relative merits. Using the lessons learned from our previous crawler evaluation studies, we experiment with multiple versions of different classification schemes. The crawling process is modeled as a parallel best-first search over a graph defined by the Web. The classifiers provide heuristics to the crawler thus biasing it towards certain portions of the Web graph. Our results show that Naive Bayes is a weak choice for guiding a topical crawler when compared with Support Vector Machine or Neural Network. Further, the weak performance of Naive Bayes can be partly explained by extreme skewness of posterior probabilities generated by it. We also observe that despite similar performances, different topical crawlers cover subspaces on the Web with low overlap.