The automatic identification of stop words
Journal of Information Science
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Performance Evaluation of Some Clustering Algorithms and Validity Indices
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
2005 Special Issue: Efficient streaming text clustering
Neural Networks - 2005 Special issue: IJCNN 2005
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A robust video text detection approach using SVM
Expert Systems with Applications: An International Journal
Content-based crowd retrieval on the real-time web
Proceedings of the 21st ACM international conference on Information and knowledge management
Sumblr: continuous summarization of evolving tweet streams
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Exploiting online social data in ontology learning for event tracking and emergency response
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 12.05 |
Text steam analysis is now of great importance and practical value today. It has several applications such as news group filtering, topic detection & tracking (TDT), user characterized recommendation etc. Clustering is one of the most important methods of analyzing text stream. However, most text stream clustering algorithms rarely consider the possible change of features during a long-time of clustering, which is usually the case, leading to unsatisfactory results of the clustering system. The paper mainly focuses on the problem of adaptive feature selection for clustering text stream. A validity index based method of adaptive feature selection is proposed, incorporating with which a new text stream clustering algorithm is developed. During the clustering process, threshold of cluster valid index is used to automatically trigger feature re-selection in order to ensure the validity of clustering. The experiment using Reuters-21578 text set as the text source shows that the clustering algorithm reaches reasonable results of high quality.