2005 Special Issue: Efficient streaming text clustering

Authors:
Shi Zhong
Affiliations:
Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
Venue:
Neural Networks - 2005 Special issue: IJCNN 2005
Year:
2005

Citing 14
Cited 15

Competitive learning algorithms for vector quantization

Neural Networks
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Concept decompositions for large sparse text data using clustering

Machine Learning
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
A unified framework for model-based clustering

The Journal of Machine Learning Research
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On competitive learning

IEEE Transactions on Neural Networks
Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres

IEEE Transactions on Neural Networks
`Neural-gas' network for vector quantization and its application to time-series prediction

IEEE Transactions on Neural Networks

Feature-guided clustering of multi-dimensional flow cytometry datasets

Journal of Biomedical Informatics
Bregman bubble clustering: A robust framework for mining dense clusters

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering Massive Text Data Streams by Semantic Smoothing Model

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Stream data clustering based on grid density and attraction

ACM Transactions on Knowledge Discovery from Data (TKDD)
Online Evaluation of Patterns from Evolving Web Data Streams

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Text stream clustering algorithm based on adaptive feature selection

Expert Systems with Applications: An International Journal
Experimental study on fighters behaviors mining

Expert Systems with Applications: An International Journal
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
Mining spatio-temporal information on microblogging streams using a density-based online clustering method

Expert Systems with Applications: An International Journal
Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams

Expert Systems with Applications: An International Journal
Comparing clustering algorithms and their influence on the evolution of labeled clusters

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Sumblr: continuous summarization of evolving tweet streams

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Leveraging microblogging big data with a modified density-based clustering approach for event awareness and topic ranking

Journal of Information Science
Mining Top-K Rank Frequent Patterns in Data Streams A Tree Based Approach with Ternary Function and Ternary Feature Vector

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Evolving soft subspace clustering

Applied Soft Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering data streams has been a new research topic, recently emerged from many real data mining applications, and has attracted a lot of research attention. However, there is little work on clustering high-dimensional streaming text data. This paper combines an efficient online spherical k-means (OSKM) algorithm with an existing scalable clustering strategy to achieve fast and adaptive clustering of text streams. The OSKM algorithm modifies the spherical k-means (SPKM) algorithm, using online update (for cluster centroids) based on the well-known Winner-Take-All competitive learning. It has been shown to be as efficient as SPKM, but much superior in clustering quality. The scalable clustering strategy was previously developed to deal with very large databases that cannot fit into a limited memory and that are too expensive to read/scan multiple times. Using the strategy, one keeps only sufficient statistics for history data to retain (part of) the contribution of history data and to accommodate the limited memory. To make the proposed clustering algorithm adaptive to data streams, we introduce a forgetting factor that applies exponential decay to the importance of history data. The older a set of text documents, the less weight they carry. Our experimental results demonstrate the efficiency of the proposed algorithm and reveal an intuitive and an interesting fact for clustering text streams-one needs to forget to be adaptive.