Tri-plots: scalable tools for multidimensional data mining

Authors:
Agma Traina;Caetano Traina;Spiros Papadimitriou;Christos Faloutsos
Affiliations:
University of S. Paulo at S. Carlos, Brazil;University of S. Paulo at S. Carlos, Brazil;Carnegie Mellon University;Carnegie Mellon University
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 16
Cited 8

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Spatial join selectivity using power laws

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Using the fractal dimension to cluster datasets

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Very Large Databases

Computer
Constraint-Based, Multidimensional Data Mining

Computer
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Spatial Data Mining: A Database Approach

SSD '97 Proceedings of the 5th International Symposium on Advances in Spatial Databases
Constraint-Based Rule Mining in Large, Dense Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter
"GeoPlot": spatial data mining on video libraries

Proceedings of the eleventh international conference on Information and knowledge management
A fast and effective method to find correlations among attributes in databases

Data Mining and Knowledge Discovery
LearnMet: learning domain-specific distance metrics for plots of scientific functions

Multimedia Tools and Applications
Component Selection to Optimize Distance Function Learning in Complex Scientific Data Sets

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Measuring evolving data streams' behavior through their intrinsic dimension

New Generation Computing
Mining images of material nanostructure data

ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
A cross datasets referring outlier detection model applied to suspicious financial transaction discrimination

WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from?We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.