Algorithms for clustering data
Algorithms for clustering data
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
ACM Computing Surveys (CSUR)
Item-based collaborative filtering recommendation algorithms
Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval
Empirical Evaluation of Dissimilarity Measures for Color and Texture
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Similarity between Euclidean and cosine angle distance for nearest neighbor queries
Proceedings of the 2004 ACM symposium on Applied computing
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
On the Statistical Properties of the F-measure
QSIC '04 Proceedings of the Quality Software, Fourth International Conference
The BankSearch web document dataset: investigating unsupervised clustering and category similarity
Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Text classification based on partial least square analysis
Proceedings of the 2007 ACM symposium on Applied computing
Hierarchical clustering of mixed data based on distance hierarchy
Information Sciences: an International Journal
Feature Extraction Using Sequential Semidefinite Programming
DICTA '07 Proceedings of the 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications
Multiple-vector user profiles in support of knowledge sharing
Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach
Information Sciences: an International Journal
Cluster Analysis
Adapting the right measures for K-means clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Performance evaluation of density-based clustering methods
Information Sciences: an International Journal
An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list
Information Sciences: an International Journal
Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization
Information Sciences: an International Journal
Improving document clustering using Okapi BM25 feature weighting
Information Retrieval
Efficient stochastic algorithms for document clustering
Information Sciences: an International Journal
Rough clustering using generalized fuzzy clustering algorithm
Pattern Recognition
Hi-index | 0.07 |
This paper introduces a novel pairwise-adaptive dissimilarity measure for large high dimensional document datasets that improves the unsupervised clustering quality and speed compared to the original cosine dissimilarity measure. This measure dynamically selects a number of important features of the compared pair of document vectors. Two approaches for selecting the number of features in the application of the measure are discussed. The proposed feature selection process makes this dissimilarity measure especially applicable in large, high dimensional document collections. Its performance is validated on several test sets originating from standardized datasets. The dissimilarity measure is compared to the well-known cosine dissimilarity measure using the average F-measures of the hierarchical agglomerative clustering result. This new dissimilarity measure results in an improved clustering result obtained with a lower required computational time.