GRAMS3: an efficient framework for XML structural similarity search

  • Authors:
  • Peisen Yuan;Xiaoling Wang;Chaofeng Sha;Ming Gao;Aoying Zhou

  • Affiliations:
  • School of Computer Science, Fudan University, Shanghai, P.R. China and Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, P.R. China;Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University, Shanghai, P.R. China;School of Computer Science, Fudan University, Shanghai, P.R. China and Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, P.R. China;School of Computer Science, Fudan University, Shanghai, P.R. China and Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, P.R. China;Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, P.R. China and Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal Universi ...

  • Venue:
  • DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structural similarity search is a fundamental technology for XML data management. However, existing methods do not scale well with large volume of XML document. The pq-gram is an efficient way of extracting substructure from the tree-structured data for approximate structural similarity search. In this paper, we propose an effective framework GRAMS3 for evaluating structural similarity of XML data. First pq-grams of XML document are extracted; then we study the characteristics of pq-gram of XML and generate doc-gram vector using TGF-IGF model for XML tree; finally we employ locality sensitive hashing for efficiently structural similarity search of XML documents. An empirical study using both synthetic and real datasets demonstrates the framework is efficient.