A framework for clustering categorical time-evolving data

Authors:
Fuyuan Cao;Jiye Liang;Liang Bai;Xingwang Zhao;Chuangyin Dang
Affiliations:
School of Computer and Information Technology, Shanxi University, Taiyuan, China;School of Computer and Information Technology, Shanxi University, Taiyuan, China;School of Computer and Information Technology, Shanxi University, Taiyuan, China;School of Computer and Information Technology, Shanxi University, Taiyuan, China;Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong
Venue:
IEEE Transactions on Fuzzy Systems
Year:
2010

Citing 27
Cited 7

Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The "Best K" for entropy-based categorical data clustering

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Evolutionary clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
Rough Set-Based Clustering with Refinement Using Shannon's Entropy Theory

Computers & Mathematics with Applications
Evolutionary spectral clustering by incorporating temporal smoothness

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
MMR: An algorithm for clustering categorical data using Rough Set Theory

Data & Knowledge Engineering
Clustering over Multiple Evolving Streams by Events and Correlations

IEEE Transactions on Knowledge and Data Engineering
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites

IEEE Transactions on Knowledge and Data Engineering
A new measure of uncertainty based on knowledge granulation for rough sets

Information Sciences: an International Journal
Some issues about outlier detection in rough set theory

Expert Systems with Applications: An International Journal
The Development of Fuzzy Rough Sets with the Use of Structures and Algebras of Axiomatic Fuzzy Sets

IEEE Transactions on Knowledge and Data Engineering
Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data

IEEE Transactions on Knowledge and Data Engineering
A new initialization method for categorical data clustering

Expert Systems with Applications: An International Journal
An initialization method for the K-Means algorithm using neighborhood model

Computers & Mathematics with Applications
A new extension of fuzzy sets using rough sets: R-fuzzy sets

Information Sciences: an International Journal
New approaches to fuzzy-rough feature selection

IEEE Transactions on Fuzzy Systems
Clustering of time series data-a survey

Pattern Recognition
Positive approximation: An accelerator for attribute reduction in rough set theory

Artificial Intelligence
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Attributes Reduction Using Fuzzy Rough Sets

IEEE Transactions on Fuzzy Systems
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems

An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Knowledge-Based Systems
A novel attribute weighting algorithm for clustering high-dimensional categorical data

Pattern Recognition
A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal
A dissimilarity measure for the k-Modes clustering algorithm

Knowledge-Based Systems
Determining the number of clusters using information entropy for mixed data

Pattern Recognition
A novel fuzzy clustering algorithm with between-cluster information for categorical data

Fuzzy Sets and Systems
A weighting k-modes algorithm for subspace clustering of categorical data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental assumption often made in unsupervised learning is that the problem is static, i.e., the description of the classes does not change with time. However, many practical clustering tasks involve changing environments. It is hence recognized that the methods and techniques to analyze the evolving trends for changing environments are of increasing interest and importance. Although the problem of clustering numerical time-evolving data is well-explored, the problem of clustering categorical time-evolving data remains as a challenging issue. In this paper, we propose a generalized clustering framework for categorical time-evolving data, which is composed of three algorithms: a drifting-concept detecting algorithm that detects the difference between the current sliding window and the last sliding window, a data-labeling algorithm that decides the most-appropriate cluster label for each object of the current sliding window based on the clustering results of the last sliding window, and a cluster-relationship-analysis algorithm that analyzes the relationship between clustering results at different time stamps. The time-complexity analysis indicates that these proposed algorithms are effective for large datasets. Experiments on a real dataset show that the proposed framework not only accurately detects the drifting concepts but also attains clustering results of better quality. Furthermore, compared with the other framework, the proposed one needs fewer parameters, which is favorable for specific applications.