An adaptive path index for XML data using the query workload

Authors:
Jun-Ki Min;Chin-Wan Chung;Kyuseok Shim
Affiliations:
Division of Computer Science, Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, 373-1,Kusong-dong, Yusong-gu, Taejon 305-701, Republic ...;Division of Computer Science, Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, 373-1,Kusong-dong, Yusong-gu, Taejon 305-701, Republic ...;School of Electrical Engineering and Computer Science, Seoul National University, San 56-1, Shilim-dong, Kwanak-gu, Seoul 151-742, Republic of Korea
Venue:
Information Systems
Year:
2005

Citing 18
Cited 1

Access support relations: an indexing method for object bases

Information Systems - Data bases: their creation, management, and utilization
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A query language for a Web-site management system

ACM SIGMOD Record
Extensible markup language

World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Representative Objects: Concise Representations of Semistructured, Hierarchial Data

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Object Exchange Across Heterogeneous Information Sources

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Adding Structure to Unstructured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Optimization for XML

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Indexing tree structures through caterpillar decomposition

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to its flexibility, XML is becoming the de facto standard for exchanging and querying documents over the Web. Many XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured XML data. Without a structural summary and efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. To overcome the inefficiency, several path indexes have been proposed in the research community. Traditional indexes generally record all label paths from the root element in XML data and are constructed with the use of data only. Such path indexes may result in performance degradation due to large sizes and exhaustive navigations for partial matching path queries which start with the self-or-descendent axis(''//''). To improve the query performance, we propose an adaptive path index for XML data (termed APEX). APEX does not keep all paths starting from the root and utilizes frequently used paths on query workloads. APEX also has a nice property that it can be updated incrementally according to the changes of query workloads. Experimental results with synthetic and real-life data sets clearly confirm that APEX improves the query processing cost typically 2-69 times compared with the traditional indexes, with the performance gap increasing with the irregularity of XML data.