Tri-plots: scalable tools for multidimensional data mining

  • Authors:
  • Agma Traina;Caetano Traina;Spiros Papadimitriou;Christos Faloutsos

  • Affiliations:
  • University of S. Paulo at S. Carlos, Brazil;University of S. Paulo at S. Carlos, Brazil;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from?We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.