Vectorizing and Querying Large XML Repositories

Authors:
Peter Buneman;Byron Choi;Wenfei Fan;Robert Hutchison;Robert Mann;Stratis D. Viglas
Affiliations:
University of Edinburgh;University of Edinburgh;University of Edinburgh;University of Edinburgh;University of Edinburgh;University of Edinburgh
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 17
Cited 25

Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
Shoring up persistent applications

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
On searching transposed files

ACM Transactions on Database Systems (TODS)
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Tamino - An Internet Database System

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Flattening an Object Algebra to Provide Performance

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Querying XML Views of Relational Data

Proceedings of the 27th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Relational Storage and Retrieval of XML Documents

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
TIMBER: a native system for querying XML

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Incremental evaluation of schema-directed XML publishing

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Path queries on compressed XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Cost-sensitive reordering of navigational primitives

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Benefits of path summaries in an XML query optimizer supporting multiple access methods

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Processing queries on tree-structured data efficiently

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient memory representation of XML document trees

Information Systems
A survey on querying encrypted XML documents for databases as a service

ACM SIGMOD Record
Faster path indexes for search in XML data

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Compacting XML Structures Using a Dynamic Labeling Scheme

BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Optimizing updates of recursive XML views of relations

The VLDB Journal — The International Journal on Very Large Data Bases
Principles of Holism for sequential twig pattern matching

The VLDB Journal — The International Journal on Very Large Data Bases
Exploring XML web collections with DescribeX

ACM Transactions on the Web (TWEB)
Data sources selection for XML data sources

International Journal of Intelligent Information and Database Systems
Sharing large data collections between mobile peers

Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia
Vertical fragmentation of XML data warehouses using frequent path sets

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
TraCX: transformation of compressed XML

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Document decomposition for XML compression: a heuristic approach

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Efficient memory representation of XML documents

DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
Extracting global policies for efficient access control of XML documents

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
An efficient approach to support querying secure outsourced XML information

CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
A resource efficient hybrid data structure for twig queries

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies
A quantitative summary of XML structures

ER'06 Proceedings of the 25th international conference on Conceptual Modeling
Fast answering of XPath query workloads on web collections

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Data management for mobile Ajax web 2.0 applications

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Optimized XPath evaluation for schema-compressed XML data

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to store each column separately. We use a generalization of vectorization as the basis for a native XML store. The idea is to decompose an XML document into a set of vectors that contain the data values and a compressed skeleton that describes the structure. In order to query this representation and produce results in the same vectorized format, we consider a practical fragment of XQuery and introduce the notion of query graphs and a novel graph reduction algorithm that allows us to leverage relational optimization techniques as well as to reduce the unnecessary loading of data vectors and decompression of skeletons. A preliminary experimental study based on some scientific and synthetic XML data repositories in the order of gigabytes supports the claim that these techniques are scalable and have the potential to provide performance comparable with established relational database technology.