FRACTURE mining: mining frequently and concurrently mutating structures from historical XML documents

  • Authors:
  • Ling Chen;Sourav S. Bhowmick;Liang-Tien Chia

  • Affiliations:
  • School of Computer Engineering, Nanyang Technological University, Singapore, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore, Singapore

  • Venue:
  • Data & Knowledge Engineering - Special issue: WIDM 2004
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account of the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pattern, namely, FRequently And Concurrently muTating substructUREs (FRACTURE), that is mined from the evolution of an XML document. A discovered FRACTURE is a set of substructures of an XML document that frequently change together. Knowledge obtained from FRACTURE is useful in applications such as XML indexing, XML clustering etc. In order to keep the result patterns concise and explicit, we further formulate the problem of maximal FRACTURE mining. Two algorithms, which employ the level-wise and divide-and-conquer strategies respectively, are designed to mine the set of FRACTUREs. The second algorithm, which is more efficient, is also optimized to discover the set of maximal FRACTUREs. Experiments involving a wide range of synthetic and real-life datasets verify the efficiency and scalability of the developed algorithms.