GDClust: A Graph-Based Document Clustering Technique

Authors:
M. Shahriar Hossain;Rafal A. Angryk
Affiliations:
-;-
Venue:
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Year:
2007

Citing 0
Cited 12

Frequent pattern-growth approach for document organization

Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
An Abstraction-Based Data Model for Information Retrieval

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Duplicate candidate elimination and fast support calculation for frequent subgraph mining

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
A new algorithm for mining frequent connected subgraphs based on adjacency matrices

Intelligent Data Analysis
Full duplicate candidate pruning for frequent connected subgraph mining

Integrated Computer-Aided Engineering
Semantically-guided clustering of text documents via frequent subgraphs discovery

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Parallel structural graph clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Frequent approximate subgraphs as features for graph-based image classification

Knowledge-Based Systems
A novel approach for clustering sentiments in Chinese blogs based on graph similarity

Computers & Mathematics with Applications
Abstracting for Dimensionality Reduction in Text Classification

International Journal of Intelligent Systems
A new proposal for graph-based image classification using frequent approximate subgraphs

Pattern Recognition
Clustering web documents using hierarchical representation with multi-granularity

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (Graph-Based Document Clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.