A fast and effective method to find correlations among attributes in databases

Authors:
Elaine P. Sousa;Caetano Traina, Jr.;Agma J. Traina;Leejay Wu;Christos Faloutsos
Affiliations:
Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil;Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil;Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil;Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA;Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2007

Citing 35
Cited 4

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scaling to domains with irrelevant features

Computational learning theory and natural learning systems: Volume IV
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-spacial join selectivity estimation using fractal concepts

ACM Transactions on Information Systems (TOIS)
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Spatial join selectivity using power laws

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SPARTAN: a model-based semantic compression system for massive data tables

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Tri-plots: scalable tools for multidimensional data mining

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
F4: large-scale automated forecasting using fractals

Proceedings of the eleventh international conference on Information and knowledge management
Intelligent Access to Digital Video: Informedia Project

Computer
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees

IEEE Transactions on Knowledge and Data Engineering
Using Self-Similarity to Cluster Large Data Sets

Data Mining and Knowledge Discovery
Non-Linear Dimensionality Reduction

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Feature Selection and Dualities in Maximum Entropy Discrimination

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Dimensionality Reduction of Unsupervised Data

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
The time diversification monitoring of a stock portfolio: an approach based on the fractal dimension

Proceedings of the 2004 ACM symposium on Applied computing
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Fractal Analysis of Image Textures for Indexing and Retrieval by Content

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
MultiWaveMed: a system for medical image retrieval through wavelets transformations

CBMS'03 Proceedings of the 16th IEEE conference on Computer-based medical systems
Image classification for content-based indexing

IEEE Transactions on Image Processing

Feature selection with dynamic mutual information

Pattern Recognition
Unsupervised fuzzy-rough set-based dimensionality reduction

Information Sciences: an International Journal
Analysis of large scale climate data: how well climate change models and data from real sensor networks agree?

Proceedings of the 22nd international conference on World Wide Web companion
Spatial distance join based feature selection

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of identifying meaningful patterns in a database lies at the very heart of data mining. A core objective of data mining processes is the recognition of inter-attribute correlations. Not only are correlations necessary for predictions and classifications --- since rules would fail in the absence of pattern --- but also the identification of groups of mutually correlated attributes expedites the selection of a representative subset of attributes, from which existing mappings allow others to be derived. In this paper, we describe a scalable, effective algorithm to identify groups of correlated attributes. This algorithm can handle non-linear correlations between attributes, and is not restricted to a specific family of mapping functions, such as the set of polynomials. We show the results of our evaluation of the algorithm applied to synthetic and real world datasets, and demonstrate that it is able to spot the correlated attributes. Moreover, the execution time of the proposed technique is linear on the number of elements and of correlations in the dataset.