Self-organizing maps
Concept decompositions for large sparse text data using clustering
Machine Learning
k-means: a new generalized k-means clustering algorithm
Pattern Recognition Letters
Mining massive document collections by the WEBSOM method
Information Sciences: an International Journal - Special issue: Soft computing data mining
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Simultaneous Feature Selection and Clustering Using Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Genetic-Based EM Algorithm for Learning Gaussian Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge discovery by probabilistic clustering of distributed databases
Data & Knowledge Engineering
Multitype Features Coselection for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Semantic clustering: Identifying topics in source code
Information and Software Technology
Enhancing the Effectiveness of Clustering with Spectra Analysis
IEEE Transactions on Knowledge and Data Engineering
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
A new measure of clustering effectiveness: Algorithms and experimental studies
Journal of the American Society for Information Science and Technology
Towards effective document clustering: A constrained K-means based approach
Information Processing and Management: an International Journal
A comparative evaluation of different link types on enhancing document clustering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clustering aggregation by probability accumulation
Pattern Recognition
Multi-documents Automatic Abstracting based on text clustering and semantic analysis
Knowledge-Based Systems
Data spread-based entropy clustering method using adaptive learning
Expert Systems with Applications: An International Journal
Dynamic hierarchical algorithms for document clustering
Pattern Recognition Letters
Research of fast SOM clustering for text information
Expert Systems with Applications: An International Journal
A novel self-organizing map (SOM) neural network for discrete groups of data clustering
Applied Soft Computing
A Web Search Engine-Based Approach to Measure Semantic Similarity between Words
IEEE Transactions on Knowledge and Data Engineering
Double-pass clustering technique for multilingual document collections
Journal of Information Science
A novel ant-based clustering algorithm using the kernel method
Information Sciences: an International Journal
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Dynamic self-organizing maps with controlled growth for knowledge discovery
IEEE Transactions on Neural Networks
The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data
IEEE Transactions on Neural Networks
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Owing to the rapid advance of internet technology, users have to face to a large amount of raw data from the World Wide Web every day, most of which is displayed in text format. This situation brings a great demand for efficient text analysis techniques by internet users. Since clustering is unsupervised and requires no prior knowledge, it is extensively adopted to help analyse textual data. Unfortunately, as far as I know, almost all the clustering algorithms proposed so far fail to deal with large-scale text collection. For precisely classifying large-scale text collection, a novel probability based text clustering algorithm by alternately repeating two operations (abbreviated as PTCART) is proposed in this paper. This algorithm just repeats two operations of (a) feature set construction and (b) text partition until the optimal partition is reached. Its convergent capacity is also validated. Experiments results demonstrate that, compared with several popular text clustering algorithms, PTCART has excellent performance.