Frequent pattern-growth approach for document organization

  • Authors:
  • Monika Akbar;Rafal A. Angryk

  • Affiliations:
  • Virginia Tech, Blacksburg, VA, USA;Montana State University, Bozeman, MT, USA

  • Venue:
  • Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.