Preprocessing enhancements to improve data mining algorithms

Authors:
Paraskevas Orfanidis;David J. Russomanno
Affiliations:
Department of Electrical and Computer Engineering, Herff College of Engineering, The University of Memphis, Memphis, TN 38152, USA.;Department of Electrical and Computer Engineering, Herff College of Engineering, The University of Memphis, Memphis, TN 38152, USA
Venue:
International Journal of Business Intelligence and Data Mining
Year:
2008

Citing 8
Cited 1

Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Self-Organization of Pulse-Coupled Oscillators with Application to Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Data Mining: Concepts, Models, Methods and Algorithms

Data Mining: Concepts, Models, Methods and Algorithms
MembershipMap: Data Transformation Based on Membership Aggregation

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Data dimensionality reduction with application to improving classification performance and explaining concepts of data sets

International Journal of Business Intelligence and Data Mining
A clustering algorithm based on an estimated distribution model

International Journal of Business Intelligence and Data Mining

Learning from socio-economic characteristics of IP geo-locations for cybercrime prediction

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Preprocessing is often required before using clustering or other data mining algorithms to analyse multivariate data sets. The approaches discussed in this paper are enhanced implementations of a preprocess that utilises an algorithm to cluster points in a data set based upon each attribute independently, resulting in additional information about the data points with respect to each of its dimensions. Noise, data boundaries, and likely representatives of data subsets can be more easily identified, thus significantly improving the performance of subsequent clustering or data mining algorithms by combining this additional information across all dimensions and querying the results.