Approximate Graph Schema Extraction for Semi-Structured Data

Authors:
Qiu Yue Wang;Jeffrey X. Yu;Kam-Fai Wong
Affiliations:
-;-;-
Venue:
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2000

Citing 17
Cited 4

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Finding Regular Simple Paths in Graph Databases

SIAM Journal on Computing
Evaluating queries with generalized path expressions

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Semistructured data

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Inferring structure in semistructured data

ACM SIGMOD Record
Extracting schema from semistructured data

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Querying the World Wide Web

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Representative Objects: Concise Representations of Semistructured, Hierarchial Data

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Object Exchange Across Heterogeneous Information Sources

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Adding Structure to Unstructured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Cost-based Selection of Path Expression Processing Algorithms in Object-Oriented Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Knowledge acquisition via incremental conceptual clustering

Knowledge acquisition via incremental conceptual clustering

A New Conceptual Graph Generated Algorithm for Semi-structured Databases

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Tolerant ad hoc data propagation with error quantification

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Mining schemas in semi-structured data using fuzzy decision trees

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-structured data are typically represented in the form of labeled directed graphs. They are self-describing and schemaless. The lack of a schema renders query processing over semi-structured data expensive. To overcome this predicament, some researchers proposed to use the structure of the data for schema representation. Such schemas are commonly referred to as graph schemas. Nevertheless, since semi-structured data are irregular and frequently subjected to modifications, it is costly to construct an accurate graph schema and worse still, it is difficult to maintain it thereafter. Furthermore, an accurate graph schema is generally very large, hence impractical. In this paper, an approximation approach is proposed for graph schema extraction. Approximation is achieved by summarizing the semi-structured data graph using an incremental clustering method. The preliminary experimental results have shown that approximate graph schemas were more compact than the conventional accurate graph schemas and promising in query evaluation that involved regular path expressions.