Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Ontology-Based Deep Web Data Sources Selection
HAIS '08 Proceedings of the 3rd international workshop on Hybrid Artificial Intelligence Systems
Learning to extract form labels
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
An Approach to Deep Web Crawling by Sampling
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Crawling Deep Web Using a New Set Covering Algorithm
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Kosmix: high-performance topic exploration using the deep web
Proceedings of the VLDB Endowment
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Estimating deep web data source size by capture---recapture method
Information Retrieval
Foundations and Trends in Information Retrieval
Querying capability modeling and construction of deep web sources
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Optimizing content freshness of relations extracted from the web using keyword search
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ranking bias in deep web size estimation using capture recapture method
Data & Knowledge Engineering
On building a search interface discovery system
RED'09 Proceedings of the 2nd international conference on Resource discovery
Deep Web adaptive crawling based on minimum executable pattern
Journal of Intelligent Information Systems
Incremental structured web database crawling via history versions
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Layout object model for extracting the schema of web query interfaces
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Parallelizing skyline queries for scalable distribution
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Efficient deep web crawling using reinforcement learning
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hybrid metaheuristic algorithms for minimum weight dominating set
Applied Soft Computing
Topic-Sensitive hidden-web crawling
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Materialization of web data sources
Search Computing
Crawling deep web entity pages
Proceedings of the sixth ACM international conference on Web search and data mining
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Information Systems
Proceedings of the 22nd international conference on World Wide Web companion
Mining a search engine's corpus without a query pool
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Architecture specification of rule-based deep web crawler with indexer
International Journal of Knowledge and Web Intelligence
Formal concept analysis approach for data extraction from a limited deep web database
Journal of Intelligent Information Systems
Selecting queries from sample to crawl deep web data sources
Web Intelligence and Agent Systems
Hi-index | 0.00 |
The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are only accessible through Web query forms or via Web service interfaces. Recent research efforts have been focusing on understanding these Web query forms. A critical but still largely unresolved question is: how to efficiently acquire the structured information inside Web databases through iteratively issuing meaningful queries? In this paper we focus on the central issue of enabling efficient Web database crawling through query selection, i.e. how to select good queries to rapidly harvest data records from Web databases. We model each structured Web database as a distinct attribute-value graph. Under this theoretical framework, the database crawling problem is transformed into a graph traversal one that follows "relational" links. We show that finding an optimal query selection plan is equivalent to finding a Minimum Weighted Dominating Set of the corresponding database graph, a well-known NP-Complete problem. We propose a suite of query selection techniques aiming at optimizing the query harvest rate. Extensive experimental evaluations over real Web sources and simulations over controlled database servers validate the effectiveness of our techniques and provide insights for future efforts in this