Fuzzy data mining: a literature survey and classification framework
International Journal of Networking and Virtual Organisations
Hi-index | 0.00 |
Most of the clustering algorithms perform loosely when dimensionality of the data set increase because some dimensions contain irrelevant or noisy data and randomly initialization of clusters centres gives the local optimum clustering. In this paper, we proposed a technique for reducing the effect of high dimensionality and randomly initialization of clusters centres. It consists of three phases. In first phase, the standard deviation is used to select the meaningful dimensions from high dimensional data set. In second phase, the selected dimensions produce the k initial centres by adding and subtracting the constant from its grand mean and those initial cluster centres are used in the k-means to find optimal clustering of data set in the third phase. Empirical results have shown its favorable performance in comparison with standard k-means clustering algorithms.