Incremental Document Clustering Based on Graph Model

Authors:
Tu-Anh Nguyen-Hoang;Kiem Hoang;Danh Bui-Thi;Anh-Thy Nguyen
Affiliations:
Faculty of Information Technology, University of Science, VNU-HCM, Vietnam 70000;Faculty of Computer Science, University of Information Technology, VNU-HCM, Vietnam 70000;Faculty of Information Technology, University of Science, VNU-HCM, Vietnam 70000;Faculty of Information Technology, University of Science, VNU-HCM, Vietnam 70000
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 8
Cited 3

A vector space model for automatic indexing

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
Learning Approaches for Detecting and Tracking News Events

IEEE Intelligent Systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Phrase-based Document Similarity Based on an Index Graph Model

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Incremental Document Clustering Using Cluster Similarity Histograms

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Clustering large collection of biomedical literature based on ontology-enriched bipartite graph representation and mutual refinement strategy

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Mining spatio-temporal information on microblogging streams using a density-based online clustering method

Expert Systems with Applications: An International Journal
Leveraging network structure for incremental document clustering

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Weighted Fuzzy-Possibilistic C-Means Over Large Data Sets

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new approach based on graph model and enhanced IncrementalDBSCAN to solve incremental document clustering problem. Instead of traditional vector-based model, a graph-based is used for document representation. By using graph model, we can easily update graph structure when a new document is added to database. Meanwhile, IncrementalDBSCAN is an effective incremental clustering algorithm suitable for mining in dynamically changing databases. Similarity between two documents is measured by hybrid similarity of their adapting feature vectors and shared-phrase information. Our experimental results demonstrate the effectiveness of the proposed method.