Comparing top-k XML lists

Authors:
Ramakrishna Varadarajan;Fernando FarfáN;Vagelis Hristidis
Affiliations:
Hewlett-Packard, Billerica, MA 01821, United States;Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, United States;Department of Computer Science and Engineering, University of California, Riverside, CA 92521, United States
Venue:
Information Systems
Year:
2013

Citing 24
Cited 0

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Relaxing the Triangle Inequality in Pattern Matching

International Journal of Computer Vision
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Edit Distance with Move Operations

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Alignment of Trees - An Alternative to Tree Edit

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
The overlap problem in content-oriented XML retrieval evaluation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
Comparing and aggregating rankings with ties

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
DogmatiX tracks down duplicates in XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Controlling overlap in content-oriented XML retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A survey on tree edit distance and related problems

Theoretical Computer Science
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Merging the results of approximate match operations

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Measuring the structural similarity of semistructured documents using entropy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A methodology for clustering XML documents by structure

Information Systems
Approximating tree edit distance through string edit distance

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Systems that produce ranked lists of results are abundant. For instance, Web search engines return ranked lists of Web pages. There has been work on distance measure for list permutations, like Kendall tau and Spearman's footrule, as well as extensions to handle top-k lists, which are more common in practice. In addition to ranking whole objects (e.g., Web pages), there is an increasing number of systems that provide keyword search on XML or other semistructured data, and produce ranked lists of XML sub-trees. Unfortunately, previous distance measures are not suitable for ranked lists of sub-trees since they do not account for the possible overlap between the returned sub-trees. That is, two sub-trees differing by a single node would be considered separate objects. In this paper, we present the first distance measures for ranked lists of sub-trees, and show under what conditions these measures are metrics. Furthermore, we present algorithms to efficiently compute these distance measures. Finally, we evaluate and compare the proposed measures on real data using three popular XML keyword proximity search systems.