Subspace sums for extracting non-random data from massive noise

Authors:
Anne M. Denton
Affiliations:
North Dakota State University, Department of Computer Science and Operations Research, 58108-6050, Fargo, ND, USA
Venue:
Knowledge and Information Systems
Year:
2009

Citing 31
Cited 0

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data preparation for data mining

Data preparation for data mining
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Analyzing Outliers Cautiously

IEEE Transactions on Knowledge and Data Engineering
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Finding surprising patterns in a time series database in linear time and space

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Parallel coordinates: a tool for visualizing multi-dimensional geometry

VIS '90 Proceedings of the 1st conference on Visualization '90
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Subspace Selection for Clustering High-Dimensional Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Dimension induced clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Analyzing time series gene expression data

Bioinformatics
Clustering short time series gene expression data

Bioinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Kernel-Density-Based Clustering of Time Series Subsequences Using a Continuous Random-Walk Noise Model

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective similarity measures for expression profiles

Bioinformatics
Mining gene–sample–time microarray data: a coherent gene cluster discovery approach

Knowledge and Information Systems
Pattern-based time-series subsequence clustering using radial distribution functions

Knowledge and Information Systems
The curse of dimensionality in data mining and time series prediction

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An algorithm is introduced that distinguishes relevant data points from randomly distributed noise. The algorithm is related to subspace clustering based on axis-parallel projections, but considers membership in any projected cluster of a given side length, as opposed to a particular cluster. An aggregate measure is introduced that is based on the total number of points that are close to the given point in all possible 2 d projections of a d-dimensional hypercube. No explicit summation over subspaces is required for evaluating this measure. Attribute values are normalized based on rank order to avoid making assumptions on the distribution of random data. Effectiveness of the algorithm is demonstrated through comparison with conventional outlier detection on a real microarray data set as well as on time series subsequence data.