Using a relational database for scalable XML search

Authors:
Rebecca J. Cathey;Steven M. Beitzel;Eric C. Jensen;David Grossman;Ophir Frieder
Affiliations:
Information Retrieval Laboratory, Department of Computer Science, Illinois Institute of Technology, Chicago, USA 60616;Information Retrieval Laboratory, Department of Computer Science, Illinois Institute of Technology, Chicago, USA 60616;Information Retrieval Laboratory, Department of Computer Science, Illinois Institute of Technology, Chicago, USA 60616;Information Retrieval Laboratory, Department of Computer Science, Illinois Institute of Technology, Chicago, USA 60616;Department of Computer Science, Georgetown University and IIT, Washington, USA 20057
Venue:
The Journal of Supercomputing
Year:
2008

Citing 38
Cited 2

Index structures for structured documents

Proceedings of the first ACM international conference on Digital libraries
On the complexity of database queries (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
A performance evaluation of storing XML data in relational database management systems

Proceedings of the 3rd international workshop on Web information and data management
The design and performance evaluation of alternative XML storage strategies

ACM SIGMOD Record
Path materialization revisited: an efficient storage model for XML data

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On database theory and XML

ACM SIGMOD Record
Parameterized complexity for the database theorist

ACM SIGMOD Record
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
Querying XML Views of Relational Data

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Relational Storage and Retrieval of XML Documents

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
eXist: An Open Source Native XML Database

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
Efficient Complex Query Support for Multiversion XML Documents

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Recursive XML Schemas, Recursive XML Queries, and Relational Storage: XML-to-SQL Query Translation

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Extendible Range-Based Numbering Scheme for XML Document

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
XML Data Stores: Emerging Practices

IEEE Internet Computing
On the Sequencing of Tree Structures for XML Indexing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The complexity of XPath query evaluation and XML typing

Journal of the ACM (JACM)
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Native Xquery processing in oracle XMLDB

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XML and relational database management systems: inside Microsoft® SQL Server™ 2005

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Query translation from XPATH to SQL in the presence of recursive DTDs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Exploiting native XML indexing techniques for XML retrieval in relational database systems

Proceedings of the 7th annual ACM international workshop on Web information and data management
Processing queries on tree-structured data efficiently

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Integrating document and data retrieval based on XML

The VLDB Journal — The International Journal on Very Large Data Bases
The Wikipedia XML corpus

ACM SIGIR Forum
On the complexity of nonrecursive XQuery and functional query languages on complex values

ACM Transactions on Database Systems (TODS)
XML search: languages, INEX and scoring

ACM SIGMOD Record
XQuery on SQL hosts

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query rewrite for XML in Oracle XML DB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Indexing XML data stored in a relational database

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The BIRD numbering scheme for XML and tree databases – deciding and reconstructing tree relations using efficient arithmetic operations

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

Worldwide accessibility to Yizkor books

NGITS'09 Proceedings of the 7th international conference on Next generation information technologies and systems
A change detection system for unordered XML data using a relational model

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach. We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.