A novel fuzzy kernel C-means algorithm for document clustering

Authors:
Yingshun Yin;Xiaobin Zhang;Baojun Miao;Lili Gao
Affiliations:
School of computer science, Xi'an polytechnic university, Shaanxi, China;School of computer science, Xi'an polytechnic university, Shaanxi, China;Schol of mathematical Science, Xuchang University, Henan, China;School of computer science, Xi'an polytechnic university, Shaanxi, China
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 10
Cited 0

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Fuzzy C-Means Clustering Algorithm Based on Kernel Method

ICCIMA '03 Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Data Clustering with Partial Supervision

Data Mining and Knowledge Discovery
Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning (Studies in Computational Intelligence)

Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning (Studies in Computational Intelligence)
Semi-Supervised Learning

Semi-Supervised Learning
KFCSA: a novel clustering algorithm for high-dimension data

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Validity-guided (re)clustering with applications to image segmentation

IEEE Transactions on Fuzzy Systems
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fuzzy Kernel C-Means (FKCM) algorithm can improve accuracy significantly compared with classical Fuzzy C-Means algorithms for nonlinear separability, high dimension and clusters with overlaps in input space. Despite of these advantages, several features are subjected to the applications in real world such as local optimal, outliers, the c parameter must be assigned in advance and slow convergence speed. To overcome these disadvantages, Semi-Supervised learning and validity index are employed. Semi-Supervised learning uses limited labeled data to assistant a bulk of unlabeled data. It makes the FKCM avoid drawbacks proposed. The number of cluster will great affect clustering performance. It isn't possible to assume the optimal number of clusters especially to large text corps. Validity function makes it possible to determine the suitable number of cluster in clustering process. Sparse format, scatter and gathering strategy save considerable store space and computation time. Experimental results on the Reuters-21578 benchmark dataset demonstrate that the algorithm proposed is more flexibility and accuracy than the state-of-art FKCM.