ACM Transactions on Mathematical Software (TOMS)
Elements of information theory
Elements of information theory
The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Applied multivariate techniques
Applied multivariate techniques
Data preparation for data mining
Data preparation for data mining
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Concept decompositions for large sparse text data using clustering
Machine Learning
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
Data Mining and Knowledge Discovery
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Quality Scheme Assessment in the Clustering Process
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Techniques for non-linear magnification transformations
INFOVIS '96 Proceedings of the 1996 IEEE Symposium on Information Visualization (INFOVIS '96)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining
INFORMS Journal on Computing
Optimization-based feature selection with adaptive instance sampling
Computers and Operations Research
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
From machine learning to knowledge discovery: Survey of preprocessing and postprocessing
Intelligent Data Analysis
Determining the best K for clustering transactional datasets: A coverage density-based approach
Data & Knowledge Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Artificial neural networks for feature extraction and multivariate data projection
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
The goal of this study was to develop an efficient clustering framework for processing high-dimensional datasets with reasonable memory and computing power requirements. Strehl and Ghosh proposed a novel clustering approach and developed a framework which is called "relationship-based clustering framework" [1]. In this study, a preprocessing system has been implemented on top of their approach and it has been integrated into the relationship-based clustering framework. Three different benchmark datasets were used to evaluate its efficiency. The results are presented in various tables and charts, and in addition CLUSION graphs are plotted to enable visual evaluation of cluster quality. It is demonstrated that CPU and memory usage has been substantially decreased compared with Strehl and Ghosh's framework [1], without any noticeable decrease in clustering quality. This fact enables the use of the relationship-based clustering framework for much larger datasets than was heretofore possible, and also increases its scalability with respect to number of dimensions.