A vector space model for automatic indexing
Communications of the ACM
Concept decompositions for large sparse text data using clustering
Machine Learning
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ReCoM: reinforcement clustering of multi-type interrelated data objects
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A neighborhood-based approach for clustering of linked document collections
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A comparative evaluation of different link types on enhancing document clustering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Scalable community discovery on textual data with relations
Proceedings of the 17th ACM conference on Information and knowledge management
Incremental Document Clustering Based on Graph Model
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Semantic smoothing of document models for agglomerative clustering
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
Recent studies have shown that link-based clustering methods can significantly improve the performance of content-based clustering. However, most previous algorithms are developed for fixed data sets, and are not applicable to the dynamic environments such as data warehouse and online digital library. In this paper, we introduce a novel approach which leverages the network structure for incremental clustering. Under this framework, both the link and content information are incorporated to determine the host cluster of a new document. The combination of two types of information ensures a promising performance of the clustering results. Furthermore, the status of core members is used to quickly determine whether to split or merge a new cluster. This filtering process eliminates the unnecessary and time-consuming checks of textual similarity on the whole corpus, and thus greatly speeds up the entire procedure. We evaluate our proposed approach on several real-world publication data sets and conduct an extensive comparison with both the classic content based and the recent link based algorithms. The experimental results demonstrate the effectiveness and efficiency of our method.