Distributed SLCA-based XML keyword search by map-reduce

  • Authors:
  • Chenjing Zhang;Qiang Ma;Xiaoling Wang;Aoying Zhou

  • Affiliations:
  • College of Information Technology, Shanghai Ocean University, China and School of Computer Science and Technology, Fudan University, China;School of Computer Science and Technology, Fudan University, China;Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University;School of Computer Science and Technology, Fudan University, China and Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal University

  • Venue:
  • DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large scales of XML information comes continually from new Web applications, and SLCA (Smallest Lowest Common Ancestor)-based XML keyword search is one of the most important information retrieval approaches. Previous approaches focus on building index for XML documents. However in information dissemination scenario, it is impossible to build index in advance for continuous XML document streams. This paper addresses SLCA-based keyword search for continuous XML documents by Map-Reduce mechanism. We use parallel algorithms to process plenty of XML documents in Hadoop environment. A distributed SLCA computation method is designed, where each net node computes SLCA independently and just a little information needs be transmitted. A real Hadoop environment is built and we demonstrate the efficiency of our algorithms analytically and experimentally.