An efficient preprocessing stage for the relationship-based clustering framework

Authors:
Turgay Tugay Bilgin;Ali Yilmaz Camurcu
Affiliations:
(Correspd. Tel.: +90 216 626 10 50, ext: 1409/ Fax: +90 216 626 10 70/ E-mail: ttbilgin@maltepe.edu.tr) Department of Computer Engineering, Maltepe University, Istanbul, Turkey;Technical Education Faculty, Department of Electronics and Computer Education, Marmara University, Istanbul, Turkey
Venue:
Intelligent Data Analysis
Year:
2010

Citing 25
Cited 0

Sparse matrix test problems

ACM Transactions on Mathematical Software (TOMS)
Elements of information theory

Elements of information theory
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Applied multivariate techniques

Applied multivariate techniques
Data preparation for data mining

Data preparation for data mining
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Concept decompositions for large sparse text data using clustering

Machine Learning
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Quality Scheme Assessment in the Clustering Process

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Techniques for non-linear magnification transformations

INFOVIS '96 Proceedings of the 1996 IEEE Symposium on Information Visualization (INFOVIS '96)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining

INFORMS Journal on Computing
Optimization-based feature selection with adaptive instance sampling

Computers and Operations Research
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
From machine learning to knowledge discovery: Survey of preprocessing and postprocessing

Intelligent Data Analysis
Determining the best K for clustering transactional datasets: A coverage density-based approach

Data & Knowledge Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
A clustering framework for unbalanced partitioning and outlier filtering on high dimensional datasets

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Artificial neural networks for feature extraction and multivariate data projection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this study was to develop an efficient clustering framework for processing high-dimensional datasets with reasonable memory and computing power requirements. Strehl and Ghosh proposed a novel clustering approach and developed a framework which is called "relationship-based clustering framework" [1]. In this study, a preprocessing system has been implemented on top of their approach and it has been integrated into the relationship-based clustering framework. Three different benchmark datasets were used to evaluate its efficiency. The results are presented in various tables and charts, and in addition CLUSION graphs are plotted to enable visual evaluation of cluster quality. It is demonstrated that CPU and memory usage has been substantially decreased compared with Strehl and Ghosh's framework [1], without any noticeable decrease in clustering quality. This fact enables the use of the relationship-based clustering framework for much larger datasets than was heretofore possible, and also increases its scalability with respect to number of dimensions.