Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Machine Learning
A Methodology to Retrieve Text Documents from Multiple Databases
IEEE Transactions on Knowledge and Data Engineering
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Probabilistic models for focused web crawling
Proceedings of the 6th annual ACM international workshop on Web information and data management
Data management projects at Google
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MokE: a tool for Mobile-ok evaluation of web content
W4A '08 Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A)
Learning to extract form labels
Proceedings of the VLDB Endowment
Supporting the automatic construction of entity aware search engines
Proceedings of the 10th ACM workshop on Web information and data management
Querying structured information sources on the web
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Adaptive geospatially focused crawling
Proceedings of the 18th ACM conference on Information and knowledge management
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
A web search methodology for different user typologies
CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
Foundations and Trends in Information Retrieval
Automatically constructing a directory of molecular biology databases
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
A novel design of hidden web crawler using reinforcement learning based agents
APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Optimizing content freshness of relations extracted from the web using keyword search
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Creating and exploring web form repositories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying structured information sources on the Web
International Journal of Metadata, Semantics and Ontologies
PruSM: a prudent schema matching approach for web forms
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Layout object model for extracting the schema of web query interfaces
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Discovering URLs through user feedback
Proceedings of the 20th ACM international conference on Information and knowledge management
Focusing on novelty: a crawling strategy to build diverse language models
Proceedings of the 20th ACM international conference on Information and knowledge management
Deep web integrated systems: current achievements and open issues
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes
ACM Transactions on the Web (TWEB)
Intelligent crawling of web applications for web archiving
Proceedings of the 21st international conference companion on World Wide Web
ProFoUnd: program-analysis-based form understanding
Proceedings of the 21st international conference companion on World Wide Web
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Topic-Sensitive hidden-web crawling
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Automatic discovery of Web Query Interfaces using machine learning techniques
Journal of Intelligent Information Systems
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases
Journal of Intelligent Information Systems
Crawling deep web entity pages
Proceedings of the sixth ACM international conference on Web search and data mining
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
A pattern-based selective recrawling approach for object-level vertical search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hidden-Web induced by client-side scripting: an empirical study
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Architecture specification of rule-based deep web crawler with indexer
International Journal of Knowledge and Web Intelligence
Selecting queries from sample to crawl deep web data sources
Web Intelligence and Agent Systems
Hi-index | 0.00 |
In this paper we describe new adaptive crawling strategies to efficiently locate the entry points to hidden-Web sources. The fact that hidden-Web sources are very sparsely distributedmakes the problem of locating them especially challenging. We deal with this problem by using the contents ofpages to focus the crawl on a topic; by prioritizing promisinglinks within the topic; and by also following links that may not lead to immediate benefit. We propose a new frameworkwhereby crawlers automatically learn patterns of promisinglinks and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup andtuning. Our experiments over real Web pages in a representativeset of domains indicate that online learning leadsto significant gains in harvest rates' the adaptive crawlers retrieve up to three times as many forms as crawlers thatuse a fixed focus strategy.