Automatic text processing
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Diagonally Subgraphs Pattern Mining
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
An Efficient Algorithm for Discovering Frequent Subgraphs
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Subdue: compression-based frequent pattern discovery in graph data
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
GDClust: A Graph-Based Document Clustering Technique
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Diagnosing memory leaks using graph mining on heap dumps
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Abstracting for Dimensionality Reduction in Text Classification
International Journal of Intelligent Systems
Hi-index | 0.00 |
In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.