Probability-based text clustering algorithm by alternately repeating two operations

Authors:
Ming Liu;Yuanchao Liu;Bingquan Liu;Lei Lin
Affiliations:
;;;
Venue:
Journal of Information Science
Year:
2013

Citing 30
Cited 0

Self-organizing maps

Self-organizing maps
Concept decompositions for large sparse text data using clustering

Machine Learning
k-means: a new generalized k-means clustering algorithm

Pattern Recognition Letters
Mining massive document collections by the WEBSOM method

Information Sciences: an International Journal - Special issue: Soft computing data mining
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Genetic-Based EM Algorithm for Learning Gaussian Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge discovery by probabilistic clustering of distributed databases

Data & Knowledge Engineering
Multitype Features Coselection for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Semantic clustering: Identifying topics in source code

Information and Software Technology
Enhancing the Effectiveness of Clustering with Spectra Analysis

IEEE Transactions on Knowledge and Data Engineering
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
ConSOM: A conceptional self-organizing map model for text clustering

Neurocomputing
A new measure of clustering effectiveness: Algorithms and experimental studies

Journal of the American Society for Information Science and Technology
Towards effective document clustering: A constrained K-means based approach

Information Processing and Management: an International Journal
A comparative evaluation of different link types on enhancing document clustering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clustering aggregation by probability accumulation

Pattern Recognition
Multi-documents Automatic Abstracting based on text clustering and semantic analysis

Knowledge-Based Systems
Data spread-based entropy clustering method using adaptive learning

Expert Systems with Applications: An International Journal
Dynamic hierarchical algorithms for document clustering

Pattern Recognition Letters
Sparse kernel spectral clustering models for large-scale data analysis

Neurocomputing
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
A novel self-organizing map (SOM) neural network for discrete groups of data clustering

Applied Soft Computing
A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

IEEE Transactions on Knowledge and Data Engineering
Double-pass clustering technique for multilingual document collections

Journal of Information Science
A novel ant-based clustering algorithm using the kernel method

Information Sciences: an International Journal
Self organization of a massive document collection

IEEE Transactions on Neural Networks
Dynamic self-organizing maps with controlled growth for knowledge discovery

IEEE Transactions on Neural Networks
The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data

IEEE Transactions on Neural Networks
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Owing to the rapid advance of internet technology, users have to face to a large amount of raw data from the World Wide Web every day, most of which is displayed in text format. This situation brings a great demand for efficient text analysis techniques by internet users. Since clustering is unsupervised and requires no prior knowledge, it is extensively adopted to help analyse textual data. Unfortunately, as far as I know, almost all the clustering algorithms proposed so far fail to deal with large-scale text collection. For precisely classifying large-scale text collection, a novel probability based text clustering algorithm by alternately repeating two operations (abbreviated as PTCART) is proposed in this paper. This algorithm just repeats two operations of (a) feature set construction and (b) text partition until the optimal partition is reached. Its convergent capacity is also validated. Experiments results demonstrate that, compared with several popular text clustering algorithms, PTCART has excellent performance.