Possibilistic fuzzy co-clustering of large document collections
Pattern Recognition
On clustering tree structured data with categorical nature
Pattern Recognition
Expert Systems with Applications: An International Journal
Enhanced bisecting k-means clustering using intermediate cooperation
Pattern Recognition
A Communication Perspective on Automatic Text Categorization
IEEE Transactions on Knowledge and Data Engineering
A New Method for Initialising the K-Means Clustering Algorithm
KAM '09 Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling - Volume 02
Pairwise-adaptive dissimilarity measure for document clustering
Information Sciences: an International Journal
Distributed text classification with an ensemble kernel-based learning approach
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification
IEEE Transactions on Knowledge and Data Engineering
Expert Systems with Applications: An International Journal
A subspace decision cluster classifier for text classification
Expert Systems with Applications: An International Journal
Co-clustering with augmented matrix
Applied Intelligence
A hierarchical parallel genetic approach for the graph coloring problem
Applied Intelligence
Hi-index | 0.00 |
Among the typical clustering methods, the K-means algorithm plays the most important role in clustering because of its simplicity and efficiency. However, it is sensitive to the initial points and easy to fall into local optimum. In order to avoid this kind of flaw, a patented text clustering algorithm Clustering by Genetic Algorithm Model (CGAM) is revealed in this paper. CGAM constructs the fitness function of genetic algorithm (GA) and convergence criterion for K-means algorithm because GA simulates the natural evolutionary process and deals with a larger search space. To tackle the rich semantics of Chinese texts, CGAM creates an innovative selection method of initial centers of GA and accommodates the contribution of characteristics of different parts of speech. Moreover, the impact of outliers is addressed and treated. Its performance is demonstrated by a series of experiments based on both Reuters-21578 and Chinese text corpus. Experimental results show that the CGAM achieves clustering results better than other GA based K-means algorithms and has been successfully applied to national program of business intelligence system in the context of huge set of contents in both Chinese and English.