Structured databases on the web: observations and implications

Authors:
Kevin Chen-Chuan Chang;Bin He;Chengkai Li;Mitesh Patel;Zhen Zhang
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
ACM SIGMOD Record
Year:
2004

Citing 19
Cited 61

Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Record-boundary discovery in Web documents

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Automatic discovery of language models for text databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Query routing for Web search engines: architectures and experiments

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The Clio project: managing heterogeneity

ACM SIGMOD Record
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Information Integration

IEEE Intelligent Systems
MedMaker: A Mediation System Based on Declarative Specifications

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Information Integration Using Logical Views

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Merging Ranks from Heterogeneous Internet Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Understanding Web query interfaces: best-effort parsing with hidden syntax

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A holistic paradigm for large scale schema matching

ACM SIGMOD Record
Editorial: special issue on web content mining

ACM SIGKDD Explorations Newsletter
Mining semantics for large scale integration on the web: evidences, insights, and challenges

ACM SIGKDD Explorations Newsletter
Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Relational data mapping in MIQIS

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MetaQuerier: querying structured web sources on-the-fly

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Making holistic schema matching robust: an ensemble approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Light-weight domain-based form assistant: querying web databases on the fly

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Accessing the web: from search to integration

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Meaningful labeling of integrated query interfaces

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Automatic extraction of dynamic record sections from search engine result pages

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Template extraction from candidate template set generation: a structure and content approach

Proceedings of the 43rd annual Southeast regional conference - Volume 2
Identifying redundant search engines in a very large scale metasearch engine context

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Mapping between Relational Database Schema and OWL Ontology for Deep Annotation

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Towards Deeper Understanding of the Search Interfaces of the Deep Web

World Wide Web
DeepBot: a focused crawler for accessing hidden web content

Proceedings of the 3rd international workshop on Data enginering issues in E-commerce and services: In conjunction with ACM Conference on Electronic Commerce (EC '07)
Matching large ontologies: A divide-and-conquer approach

Data & Knowledge Engineering
Integrating Data Sources and Network Analysis Tools to Support the Fight Against Organized Crime

PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Efficient Top-k Data Sources Ranking for Query on Deep Web

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
From queries to search forms: an implementation

International Journal of Computer Applications in Technology
Extracting data records from the web using tag path clustering

Proceedings of the 18th international conference on World wide web
ODE: Ontology-assisted data extraction

ACM Transactions on Database Systems (TODS)
A hierarchical approach to model web query interfaces for web source integration

Proceedings of the VLDB Endowment
Dynamic personalization for meta-queriers

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Web Crawling

Foundations and Trends in Information Retrieval
Web data extracion using visual features

Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Bottom-up discovery of clusters of maximal ranges in HTML trees for search engines results extraction

BIS'07 Proceedings of the 10th international conference on Business information systems
Ontology-based focused crawling of deep web sources

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Querying capability modeling and construction of deep web sources

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Discovering simple mappings between relational database schemas and ontologies

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
An effective method supporting data extraction and schema recognition on deep web

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Parsing query interfaces of deep web: from specialization to generalization

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Crawling the content hidden behind web forms

ICCSA'07 Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II
Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Dynamic symbolic database application testing

Proceedings of the Third International Workshop on Testing Database Systems
Research proposal for distributed deep web search

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
On building a search interface discovery system

RED'09 Proceedings of the 2nd international conference on Resource discovery
Duplicate identification in deep web data integration

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Short Communication: Ontology extraction from relational database: Concept hierarchy as background knowledge

Knowledge-Based Systems
Extracting knowledge from fuzzy relational databases with description logic

Integrated Computer-Aided Engineering
Measuring similarity of chinese web databases based on category hierarchy

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Reuse-oriented mapping discovery for meta-querier customization

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Sampling the national deep web

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
A comparison of RDB-to-RDF mapping languages

Proceedings of the 7th International Conference on Semantic Systems
UpLink: a Linked Data editor for RDB-to-RDF data

Proceedings of the 7th International Conference on Semantic Systems
Databases on the web: national web domain survey

Proceedings of the 15th Symposium on International Database Engineering & Applications
TODWEB: training-less ontology based deep web source classification

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Constructing interface schemas for search interfaces of web databases

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Data mapping as search

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Reasoning of fuzzy relational databases with fuzzy ontologies

International Journal of Intelligent Systems
An analysis of free-text queries for a multi-field web form

Proceedings of the 4th Information Interaction in Context Symposium
On estimating the scale of national deep web

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Associating labels and elements of deep web query interface based on DOM

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Towards discovering conceptual models behind web sites

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems
GAT: Platform for automatic context-aware mobile services for m-tourism

Expert Systems with Applications: An International Journal
Understanding query interfaces by statistical parsing

ACM Transactions on the Web (TWEB)
Hidden-Web induced by client-side scripting: an empirical study

ICWE'13 Proceedings of the 13th international conference on Web Engineering
Architecture specification of rule-based deep web crawler with indexer

International Journal of Knowledge and Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.