Extracting structural features among words from document data streams

  • Authors:
  • Kumiko Ishida;Tomoyuki Uchida;Kayo Kawamoto

  • Affiliations:
  • Depart. of Computer and Media Tech., Hiroshima City University, Japan;Faculty of Information Sciences, Hiroshima City University, Japan;Faculty of Information Sciences, Hiroshima City University, Japan

  • Venue:
  • AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the online data mining problem of continuously extracting all structural features among words from an infinite sequence of tree structured documents. In order to represent structural features among words appearing in tree structured documents, firstly, we introduce a consecutive path pattern (CPP, for short) on a list of words. A CPP is a sequence of consecutive paths from leaves to leaves. Then, we give a matching function over CPPs with respect to the recent frequency of a CPP, the recency of a CPP and the viewing time of tree structured document in which a CPP appears. Secondly, we present an online algorithm based on a sliding window strategy for extracting continuously all maximal CPPs as characteristic structural features from an infinite sequence of tree structured documents. Finally, by reporting experimental results on our algorithm, we show the good performance of our algorithm.