Computational geometry: an introduction
Computational geometry: an introduction
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Pattern matching algorithms
Pattern matching algorithms
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Accelerating XPath location steps
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
An Implementation and Performance Analysis of Spatial Data Access Methods
Proceedings of the Fifth International Conference on Data Engineering
Efficient Computation of Spatial Joins
Proceedings of the Ninth International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On Optimal Node Splitting for R-trees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Change-Centric Management of Versions in an XML Warehouse
Proceedings of the 27th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Matching XML documents in highly dynamic applications
Proceedings of the eighth ACM symposium on Document engineering
Evaluating Performance and Quality of XML-Based Similarity Joins
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Retrieving XML data from heterogeneous sources through vague querying
ACM Transactions on Internet Technology (TOIT)
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
XML: some papers in a haystack
ACM SIGMOD Record
Generalizing prefix filtering to improve set similarity joins
Information Systems
Ingredients for accurate, fast, and robust XML similarity joins
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Similarity join on XML based on k-generation set distance
WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
Leveraging the storage layer to support XML similarity joins in XDBMSs
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Hi-index | 0.00 |
XML is widely recognized as the data interchange standard of tomorrow because of its ability to represent data from a variety of sources. Hence, XML is likely to be the format through which data from multiple sources is integrated. In this article, we study the problem of integrating XML data sources through correlations realized as join operations. A challenging aspect of this operation is the XML document structure. Two documents might convey approximately or exactly the same information but may be quite different in structure. Consequently, an approximate match in structure, in addition to content, has to be folded into the join operation. We quantify an approximate match in structure and content for pairs of XML documents using well defined notions of distance. We show how notions of distance that have metric properties can be incorporated in a framework for joins between XML data sources and introduce the idea of reference sets to facilitate this operation. Intuitively, a reference set consists of data elements used to project the data space. We characterize what constitutes a good choice of a reference set, and we propose sampling-based algorithms to identify them. We then instantiate our join framework using the tree edit distance between a pair of trees. We next turn our attention to utilizing well known index structures to improve the performance of approximate XML join operations. We present a methodology enabling adaptation of index structures for this problem, and we instantiate it in terms of the R-tree. We demonstrate the practical utility of our solutions using large collections of real and synthetic XML data sets, varying parameters of interest, and highlighting the performance benefits of our approach.