LCS-TRIM: dynamic programming meets XML indexing and querying

Authors:
Shirish Tatikonda;Srinivasan Parthasarathy;Matthew Goyder
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 27
Cited 13

The String-to-String Correction Problem

Journal of the ACM (JACM)
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
Quilt: An XML Query Language for Heterogeneous Data Sources

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Longest Common Subsequence Algorithms

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Pushing Convertible Constraints in Frequent Itemset Mining

Data Mining and Knowledge Discovery
An evaluation of XML indexes for structural join

ACM SIGMOD Record
On the Sequencing of Tree Structures for XML Indexing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Tree-pattern queries on a lightweight XML processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient indexing and querying of XML data using modified Prüfer sequences

Proceedings of the 14th ACM international conference on Information and knowledge management
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
TwigStackList ¬: a holistic twig join algorithm for twig query with not-predicates on XML data

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Efficient processing of ordered XML twig pattern

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
PathStack¬: a holistic path join algorithm for path query with not-predicates on XML data

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

An adaptive memory conscious approach for mining frequent trees: implications for multi-core architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A schema matching-based approach to XML schema clustering

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Improving XML schema matching performance using Prüfer sequences

Data & Knowledge Engineering
BPI-TWIG: XML Twig Query Evaluation

XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
A Prüfer Based Approach to Process Top-k Queries in XML

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
BPI: XML query evaluation using bitmapped path indices

Proceedings of the 2009 EDBT/ICDT Workshops
Effective pruning for XML structural match queries

Data & Knowledge Engineering
Combining schema and level-based matching for web service discovery

ICWE'10 Proceedings of the 10th international conference on Web engineering
Examining the impact of data-access cost on XML twig pattern matching

Information Sciences: an International Journal
OXDP & OXiP: the notion of objects for efficient large XML data queries

International Journal of Grid and Utility Computing
Diversifying query results on semi-structured data

Proceedings of the 21st ACM international conference on Information and knowledge management
Semantic-based construction of content and structure XML index

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we propose a new approach for querying and indexing a database of trees with specific applications to XML datasets. Our approach relies on representing both the queries and the data using a sequential encoding and then subsequently employing an innovative variant of the longest common subsequence (LCS) matching algorithm to retrieve the desired results. A key innovation here is the use of a series of inter-linked early pruning steps, coupled with a simple index structure that enable us to reduce the search space and eliminate a large number of false positive matches prior to applying the more expensive LCS matching algorithm. Additionally, we also present mechanisms that enable the user to specify constraints on the retrieved output and show how such constraints can be pushed deep into the retrieval process, leading to improved response times. Mechanisms supporting the retrieval of approximate matches are also supported. When compared with state-of-the-art approaches, the query processing time of our algorithms is shown to be up to two to three orders of magnitude faster on several real datasets on realistic query workloads. Finally, we show that our approach is suitable for emerging multi-core server architectures when retrieving data for more expensive queries.