A fast and effective method to find correlations among attributes in databases

  • Authors:
  • Elaine P. Sousa;Caetano Traina, Jr.;Agma J. Traina;Leejay Wu;Christos Faloutsos

  • Affiliations:
  • Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil;Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil;Department of Computer Science, University of São Paulo at São Carlos, São Carlos, Brazil;Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA;Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of identifying meaningful patterns in a database lies at the very heart of data mining. A core objective of data mining processes is the recognition of inter-attribute correlations. Not only are correlations necessary for predictions and classifications --- since rules would fail in the absence of pattern --- but also the identification of groups of mutually correlated attributes expedites the selection of a representative subset of attributes, from which existing mappings allow others to be derived. In this paper, we describe a scalable, effective algorithm to identify groups of correlated attributes. This algorithm can handle non-linear correlations between attributes, and is not restricted to a specific family of mapping functions, such as the set of polynomials. We show the results of our evaluation of the algorithm applied to synthetic and real world datasets, and demonstrate that it is able to spot the correlated attributes. Moreover, the execution time of the proposed technique is linear on the number of elements and of correlations in the dataset.