A search space reduction methodology for data mining in large databases

Authors:
Angel Kuri-Morales;Fátima Rodríguez-Erazo
Affiliations:
Department of Computer Science, Instituto Tecnologico Autonomo de Mexico, Rio Hondo No. 1, Col. Tizapan San Angel, C.P. 01000 México D.F., Mexico;Posgrado en Ciencia e Ingeniería de la Computación, Universidad Nacional Autónoma de México, Ciudad Universitaria, Del. Coyoacán, C.P. 04510 México D.F., Mexico
Venue:
Engineering Applications of Artificial Intelligence
Year:
2009

Citing 16
Cited 8

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Snakes and sandwiches: optimal clustering strategies for a data warehouse

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining explained: a manager's guide to customer-centric business intelligence

Data mining explained: a manager's guide to customer-centric business intelligence
Knowledge discovery in data warehouses

ACM SIGMOD Record
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
New unsupervised clustering algorithm for large datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Segmentation problems

Journal of the ACM (JACM)
A non-linear dimensionality-reduction technique for fast similarity search in large databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Scalable Representative Instance Selection and Ranking

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
A divide-and-merge methodology for clustering

ACM Transactions on Database Systems (TODS)
A search space reduction methodology for large databases: a case study

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Data mining on multimedia data

Data mining on multimedia data

Prototype selection algorithms for distributed learning

Pattern Recognition
An agent-based framework for distributed learning

Engineering Applications of Artificial Intelligence
An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering

Engineering Applications of Artificial Intelligence
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Multivariate analysis of human behavior data using fuzzy windowing: Example with driver-car-environment system

Engineering Applications of Artificial Intelligence
Genetic algorithms in feature and instance selection

Knowledge-Based Systems
An automated search space reduction methodology for large databases

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Using data mining synergies for evaluating criteria at pre-qualification stage of supplier selection

Journal of Intelligent Manufacturing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the present need for Customer Relationship and the increased growth of the size of databases, many new approaches to large database clustering and processing have been attempted. In this work, we propose a methodology based on the idea that statistically proven search space reduction is possible in practice. Two clustering models are generated: one corresponding to the full data set and another pertaining to the sampled data set. The resulting empirical distributions were mathematically tested to verify a tight non-linear significant approximation.