Fast algorithms for the unit cost editing distance between trees
Journal of Algorithms
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Containment join size estimation: models and methods
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Geometric techniques for clustering: theory and practice
Geometric techniques for clustering: theory and practice
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
From tree patterns to generalized tree patterns: on efficient evaluation of XQuery
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Statistical learning techniques for costing XML queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate matching of hierarchical data using pq-grams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Structure and content scoring for XML
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining conserved XML query paths for dynamic-conscious caching
Proceedings of the 14th ACM international conference on Information and knowledge management
Usage-Based PageRank for Web Personalization
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Report on the DB/IR panel at SIGMOD 2005
ACM SIGMOD Record
Graph-based synopses for relational selectivity estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Cost-based optimization in DB2 XML
IBM Systems Journal
An incrementally maintainable index for approximate lookups in hierarchical data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
AQAX: a system for approximate XML query answers
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
Query relaxation using malleable schemas
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Web site personalization based on link analysis and navigational patterns
ACM Transactions on Internet Technology (TOIT)
Temporal XML: modeling, indexing, and query processing
The VLDB Journal — The International Journal on Very Large Data Bases
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
A heuristic approach for checking containment of generalized tree-pattern queries
Proceedings of the 17th ACM conference on Information and knowledge management
RRSi: indexing XML data for proximity twig queries
Knowledge and Information Systems
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Journal of Systems and Software
Containment of partially specified tree-pattern queries in the presence of dimension graphs
The VLDB Journal — The International Journal on Very Large Data Bases
ROX: run-time optimization of XQueries
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Process of applying data mining techniques to XML data
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Synopsis based load shedding in XML streams
Proceedings of the 2009 EDBT/ICDT Workshops
Exploring XML web collections with DescribeX
ACM Transactions on the Web (TWEB)
A load shedding framework for XML stream joins
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Approximate and incremental processing of complex queries against the web of data
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Relaxing queries based on XML structure and content preferences
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
What's next in XML and databases?
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Validity-sensitive querying of XML databases
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Top-K data source selection for keyword queries over multiple XML data sources
Journal of Information Science
Materialized view selection for XQuery workloads
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Fast answering of XPath query workloads on web collections
XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Spelling Suggestion for XML Keyword Search Based on XSketch Synopsis
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Querying fuzzy spatiotemporal data using XQuery
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
The rapid adoption of XML as the standard for data representation and exchange foreshadows a massive increase in the amounts of XML data collected, maintained, and queried over the Internet or in large corporate data-stores. Inevitably, this will result in the development of on-line decision support systems, where users and analysts interactively explore large XML data sets through a declarative query interface (e.g., XQuery or XSLT). Given the importance of remaining interactive, such on-line systems can employ approximate query answers as an effective mechanism for reducing response time and providing users with early feedback. This approach has been successfully used in relational systems and it becomes even more compelling in the XML world, where the evaluation of complex queries over massive tree-structured data is inherently more expensive.In this paper, we initiate a study of approximate query answering techniques for large XML databases. Our approach is based on a novel, conceptually simple, yet very effective XML-summarization mechanism: TREESKETCH synopses. We demonstrate that, unlike earlier techniques focusing solely on selectivity estimation, our TREESKETCH synopses are much more effective in capturing the complete tree structure of the underlying XML database. We propose novel construction algorithms for building TREESKETCH summaries of limited size, and describe schemes for processing general XML twig queries over a concise TREESKETCH in order to produce very fast, approximate tree-structured query answers. To quantify the quality of such approximate answers, we propose a novel, intuitive error metric that captures the quality of the approximation in terms of both the overall structure of the XML tree and the distribution of document edges. Experimental results on real-life and synthetic data sets verify the effectiveness of our TREESKETCH synopses in producing fast, accurate approximate answers and demonstrate their benefits over previously proposed techniques that focus solely on selectivity estimation. In particular, TREESKETCHes yield faster, more accurate approximate answers and selectivity estimates, and are more efficient to construct. To the best of our knowledge, ours is the first work to address the timely problem of producing fast, approximate tree-structured answers for complex XML queries.