The String-to-String Correction Problem
Journal of the ACM (JACM)
Algorithms for the Longest Common Subsequence Problem
Journal of the ACM (JACM)
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Accelerating XPath location steps
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
APEX: an adaptive path index for XML data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Algorithms
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data
Proceedings of the 27th International Conference on Very Large Data Bases
Quilt: An XML Query Language for Heterogeneous Data Sources
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Longest Common Subsequence Algorithms
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
ViST: a dynamic index method for querying XML data by tree structures
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Pushing Convertible Constraints in Frequent Itemset Mining
Data Mining and Knowledge Discovery
An evaluation of XML indexes for structural join
ACM SIGMOD Record
On the Sequencing of Tree Structures for XML Indexing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Tree-pattern queries on a lightweight XML processor
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient indexing and querying of XML data using modified Prüfer sequences
Proceedings of the 14th ACM international conference on Information and knowledge management
Efficient structural joins on indexed XML documents
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Holistic twig joins on indexed XML documents
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
TwigStackList ¬: a holistic twig join algorithm for twig query with not-predicates on XML data
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Efficient processing of ordered XML twig pattern
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
PathStack¬: a holistic path join algorithm for path query with not-predicates on XML data
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A schema matching-based approach to XML schema clustering
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Improving XML schema matching performance using Prüfer sequences
Data & Knowledge Engineering
BPI-TWIG: XML Twig Query Evaluation
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
A Prüfer Based Approach to Process Top-k Queries in XML
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Mining tree-structured data on multicore systems
Proceedings of the VLDB Endowment
BPI: XML query evaluation using bitmapped path indices
Proceedings of the 2009 EDBT/ICDT Workshops
Effective pruning for XML structural match queries
Data & Knowledge Engineering
Combining schema and level-based matching for web service discovery
ICWE'10 Proceedings of the 10th international conference on Web engineering
Examining the impact of data-access cost on XML twig pattern matching
Information Sciences: an International Journal
OXDP & OXiP: the notion of objects for efficient large XML data queries
International Journal of Grid and Utility Computing
Diversifying query results on semi-structured data
Proceedings of the 21st ACM international conference on Information and knowledge management
Semantic-based construction of content and structure XML index
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Hi-index | 0.00 |
In this article, we propose a new approach for querying and indexing a database of trees with specific applications to XML datasets. Our approach relies on representing both the queries and the data using a sequential encoding and then subsequently employing an innovative variant of the longest common subsequence (LCS) matching algorithm to retrieve the desired results. A key innovation here is the use of a series of inter-linked early pruning steps, coupled with a simple index structure that enable us to reduce the search space and eliminate a large number of false positive matches prior to applying the more expensive LCS matching algorithm. Additionally, we also present mechanisms that enable the user to specify constraints on the retrieved output and show how such constraints can be pushed deep into the retrieval process, leading to improved response times. Mechanisms supporting the retrieval of approximate matches are also supported. When compared with state-of-the-art approaches, the query processing time of our algorithms is shown to be up to two to three orders of magnitude faster on several real datasets on realistic query workloads. Finally, we show that our approach is suitable for emerging multi-core server architectures when retrieving data for more expensive queries.