Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Automatic discovery of language models for text databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Query routing for Web search engines: architectures and experiments
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The Clio project: managing heterogeneity
ACM SIGMOD Record
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
IEEE Intelligent Systems
MedMaker: A Mediation System Based on Declarative Specifications
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Information Integration Using Logical Views
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Merging Ranks from Heterogeneous Internet Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A holistic paradigm for large scale schema matching
ACM SIGMOD Record
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Mining semantics for large scale integration on the web: evidences, insights, and challenges
ACM SIGKDD Explorations Newsletter
Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Relational data mapping in MIQIS
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MetaQuerier: querying structured web sources on-the-fly
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Making holistic schema matching robust: an ensemble approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Light-weight domain-based form assistant: querying web databases on the fly
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Automatic complex schema matching across Web query interfaces: A correlation mining approach
ACM Transactions on Database Systems (TODS)
Accessing the web: from search to integration
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Meaningful labeling of integrated query interfaces
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Template extraction from candidate template set generation: a structure and content approach
Proceedings of the 43rd annual Southeast regional conference - Volume 2
Identifying redundant search engines in a very large scale metasearch engine context
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Mapping between Relational Database Schema and OWL Ontology for Deep Annotation
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
DeepBot: a focused crawler for accessing hidden web content
Proceedings of the 3rd international workshop on Data enginering issues in E-commerce and services: In conjunction with ACM Conference on Electronic Commerce (EC '07)
Matching large ontologies: A divide-and-conquer approach
Data & Knowledge Engineering
Integrating Data Sources and Network Analysis Tools to Support the Fight Against Organized Crime
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Efficient Top-k Data Sources Ranking for Query on Deep Web
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
From queries to search forms: an implementation
International Journal of Computer Applications in Technology
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
A hierarchical approach to model web query interfaces for web source integration
Proceedings of the VLDB Endowment
Dynamic personalization for meta-queriers
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Foundations and Trends in Information Retrieval
Web data extracion using visual features
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
BIS'07 Proceedings of the 10th international conference on Business information systems
Ontology-based focused crawling of deep web sources
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Querying capability modeling and construction of deep web sources
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Discovering simple mappings between relational database schemas and ontologies
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
An effective method supporting data extraction and schema recognition on deep web
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Parsing query interfaces of deep web: from specialization to generalization
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Crawling the content hidden behind web forms
ICCSA'07 Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II
Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Dynamic symbolic database application testing
Proceedings of the Third International Workshop on Testing Database Systems
Research proposal for distributed deep web search
PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
On building a search interface discovery system
RED'09 Proceedings of the 2nd international conference on Resource discovery
Duplicate identification in deep web data integration
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Extracting knowledge from fuzzy relational databases with description logic
Integrated Computer-Aided Engineering
Measuring similarity of chinese web databases based on category hierarchy
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Reuse-oriented mapping discovery for meta-querier customization
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Sampling the national deep web
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
A comparison of RDB-to-RDF mapping languages
Proceedings of the 7th International Conference on Semantic Systems
UpLink: a Linked Data editor for RDB-to-RDF data
Proceedings of the 7th International Conference on Semantic Systems
Databases on the web: national web domain survey
Proceedings of the 15th Symposium on International Database Engineering & Applications
TODWEB: training-less ontology based deep web source classification
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Constructing interface schemas for search interfaces of web databases
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Reasoning of fuzzy relational databases with fuzzy ontologies
International Journal of Intelligent Systems
An analysis of free-text queries for a multi-field web form
Proceedings of the 4th Information Interaction in Context Symposium
On estimating the scale of national deep web
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Associating labels and elements of deep web query interface based on DOM
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Towards discovering conceptual models behind web sites
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases
Journal of Intelligent Information Systems
GAT: Platform for automatic context-aware mobile services for m-tourism
Expert Systems with Applications: An International Journal
Understanding query interfaces by statistical parsing
ACM Transactions on the Web (TWEB)
Hidden-Web induced by client-side scripting: an empirical study
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Architecture specification of rule-based deep web crawler with indexer
International Journal of Knowledge and Web Intelligence
Hi-index | 0.00 |
The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.