Modern Information Retrieval
Alignment of Trees - An Alternative to Tree Edit
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Information Systems - Special issue on web data integration
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Approximate matching of hierarchical data using pq-grams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding Syntactic Similarities Between XML Documents
DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
XML Data Integration Based on Content and Structure Similarity Using Keys
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
Distributed similarity search in high dimensions using locality sensitive hashing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
XML structural similarity search using mapreduce
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Hi-index | 0.00 |
Structural similarity search is a fundamental technology for XML data management. However, existing methods do not scale well with large volume of XML document. The pq-gram is an efficient way of extracting substructure from the tree-structured data for approximate structural similarity search. In this paper, we propose an effective framework GRAMS3 for evaluating structural similarity of XML data. First pq-grams of XML document are extracted; then we study the characteristics of pq-gram of XML and generate doc-gram vector using TGF-IGF model for XML tree; finally we employ locality sensitive hashing for efficiently structural similarity search of XML documents. An empirical study using both synthetic and real datasets demonstrates the framework is efficient.