Frequent pattern-growth approach for document organization

Authors:
Monika Akbar;Rafal A. Angryk
Affiliations:
Virginia Tech, Blacksburg, VA, USA;Montana State University, Bozeman, MT, USA
Venue:
Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
Year:
2008

Citing 12
Cited 2

Automatic text processing

Automatic text processing
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Diagonally Subgraphs Pattern Mining

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Subdue: compression-based frequent pattern discovery in graph data

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
GDClust: A Graph-Based Document Clustering Technique

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops

Diagnosing memory leaks using graph mining on heap dumps

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Abstracting for Dimensionality Reduction in Text Classification

International Journal of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.