Exploring XML web collections with DescribeX

Authors:
Mariano P. Consens;Renée J. Miller;Flavio Rizzolo;Alejandro A. Vaisman
Affiliations:
University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada;University of Ottawa and Carleton University;Universidad de Buenos Aires, Buenos Aires, Argentina
Venue:
ACM Transactions on the Web (TWEB)
Year:
2010

Citing 58
Cited 1

Three partition refinement algorithms

SIAM Journal on Computing
Optimizing queries on files

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Finding Regular Simple Paths in Graph Databases

SIAM Journal on Computing
Graph-theoretic methods in database theory

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A comparison of labeling schemes for ancestor queries

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
XTRACT: Learning Document Type Descriptors from XML Document Collections

Data Mining and Knowledge Discovery
Representative Objects: Concise Representations of Semistructured, Hierarchial Data

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
An XML Indexing Structure with Relative Region Coordinate

Proceedings of the 17th International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Schema Mapping as Query Discovery

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
One-pass evaluation of region algebra expressions

Information Systems
Maintaining order in a linked list

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Multiresolution Indexing of XML for Frequent Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An efficient algorithm for computing bisimulation equivalence

Theoretical Computer Science
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Incremental maintenance of XML structural indexes

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient Processing of XML Containment Queries Using Partition-Based Schemes

IDEAS '04 Proceedings of the International Database Engineering and Applications Symposium
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Rewriting XPath queries using materialized views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Benefits of path summaries in an XML query optimizer supporting multiple access methods

VLDB '05 Proceedings of the 31st international conference on Very large data bases
From region encoding to extended dewey: on efficient processing of XML twig pattern matching

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Query caching and view selection for XML databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Taxonomy of XML schema languages using formal language theory

ACM Transactions on Internet Technology (TOIT)
XCluster Synopses for Structured XML Content

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
The Wikipedia XML corpus

ACM SIGIR Forum
Efficient discovery of XML data redundancies

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Inference of concise DTDs from XML data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
FIX: feature-based indexing technique for XML documents

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Schema summarization

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Answering tree pattern queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Expressiveness and complexity of XML Schema

ACM Transactions on Database Systems (TODS)
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Updates for structure indexes

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Translating web data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for using materialized XPath views in XML query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Querying complex structured databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML schema refinement through redundancy detection and normalization

The VLDB Journal — The International Journal on Very Large Data Bases
Enabling Schema-Free XQuery with meaningful query focus

The VLDB Journal — The International Journal on Very Large Data Bases
Temporal XML: modeling, indexing, and query processing

The VLDB Journal — The International Journal on Very Large Data Bases
AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
DescribeX: Interacting with AxPRE Summaries

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A methodology for coupling fragments of XPath with structural indexes for XML documents

DBPL'07 Proceedings of the 11th international conference on Database programming languages
Fast answering of XPath query workloads on web collections

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies

Efficient query answering in probabilistic RDF graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge needed to visualize, use, query and manage documents. Even when XML Web documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema. To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections.