The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Pattern matching algorithms
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The string edit distance matching problem with moves
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Dimension Reduction in the \ell _1 Norm
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Algorithmic Applications of Low-Distortion Geometric Embeddings
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Efficient Filtering of XML Documents with XPath Expressions
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Unordered Tree Mining with Applications to Phylogeny
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
A fuzzy extension of the XPath query language
Journal of Intelligent Information Systems
Matchmaking OWL-S processes: an approach based on path signatures
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
The q-gram distance for ordered unlabeled trees
DS'05 Proceedings of the 8th international conference on Discovery Science
Indexing for subtree similarity-search using edit distance
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing an upper bound of O(log2 n log* n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to: (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and, (2) approximate the result of tree-edit distance similarity joins over continuous XML document streams. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.