XML structural similarity search using mapreduce

  • Authors:
  • Peisen Yuan;Chaofeng Sha;Xiaoling Wang;Bin Yang;Aoying Zhou;Su Yang

  • Affiliations:
  • School of Computer Science, Fudan University, P.R.C and Shanghai Key Laboratory of Intelligent Information Processing, P.R.C;School of Computer Science, Fudan University, P.R.C and Shanghai Key Laboratory of Intelligent Information Processing, P.R.C;Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University, P.R.C;School of Computer Science, Fudan University, P.R.C and Shanghai Key Laboratory of Intelligent Information Processing, P.R.C;Shanghai Key Laboratory of Intelligent Information Processing, P.R.C and Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University, P.R.C;School of Computer Science, Fudan University, P.R.C and Shanghai Key Laboratory of Intelligent Information Processing, P.R.C

  • Venue:
  • WAIM'10 Proceedings of the 11th international conference on Web-age information management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more attention in the database community recently. In this paper, an efficient and scalable framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel computing framework for efficient structural similarity search processing. An empirical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.