A vector space model for automatic indexing
Communications of the ACM
Concept decompositions for large sparse text data using clustering
Machine Learning
Unsupervised Feature Selection Using Feature Similarity
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Feature Weighting in k-Means Clustering
Machine Learning
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
The Journal of Machine Learning Research
Proceedings of the 2004 ACM symposium on Applied computing
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A local search approximation algorithm for k-means clustering
Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometrySoCG2002
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Locally adaptive metrics for clustering high dimensional data
Data Mining and Knowledge Discovery
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
Best of both: a hybridized centroid-medoid clustering heuristic
Proceedings of the 24th international conference on Machine learning
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm
Computational Statistics & Data Analysis
An active learning framework for semi-supervised document clustering with language modeling
Data & Knowledge Engineering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Initializing Partition-Optimization Algorithms
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficiently tracing clusters over high-dimensional on-line data streams
Data & Knowledge Engineering
Unsupervised feature selection for multi-cluster data
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Multilabel dimensionality reduction via dependence maximization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Exploiting word cluster information for unsupervised feature selection
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
A significance-based graph model for clustering web documents
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Subspace clustering of text documents with feature weighting k-means algorithm
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Unsupervised feature selection for text data
ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Editorial: Classifying text streams by keywords using classifier ensemble
Data & Knowledge Engineering
A unique property of single-link distance and its application in data clustering
Data & Knowledge Engineering
SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices
Data & Knowledge Engineering
An architecture for component-based design of representative-based clustering algorithms
Data & Knowledge Engineering
Hi-index | 0.00 |
The use of centroids as prototypes for clustering text documents with the k-means family of methods is not always the best choice for representing text clusters due to the high dimensionality, sparsity, and low quality of text data. Especially for the cases where we seek clusters with small number of objects, the use of centroids may lead to poor solutions near the bad initial conditions. To overcome this problem, we propose the idea of synthetic cluster prototype that is computed by first selecting a subset of cluster objects (instances), then computing the representative of these objects and finally selecting important features. In this spirit, we introduce the MedoidKNN synthetic prototype that favors the representation of the dominant class in a cluster. These synthetic cluster prototypes are incorporated into the generic spherical k-means procedure leading to a robust clustering method called k-synthetic prototypes (k-sp). Comparative experimental evaluation demonstrates the robustness of the approach especially for small datasets and clusters overlapping in many dimensions and its superior performance against traditional and subspace clustering methods.