Findout: finding outliers in very large datasets

Authors:
Dantong Yu;Gholamhosein Sheikholeslami;Aidong Zhang
Affiliations:
Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York;Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York;Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York
Venue:
Knowledge and Information Systems
Year:
2002

Citing 13
Cited 25

Applied multivariate statistical analysis

Applied multivariate statistical analysis
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine vision

Machine vision
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Computing depth contours of bivariate point clouds

Computational Statistics & Data Analysis - Special issue on classification
Semantic clustering and querying on heterogeneous features for visual data

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The Art of Computer Programming Volumes 1-3 Boxed Set

The Art of Computer Programming Volumes 1-3 Boxed Set
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases

Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Detecting outliers in interval data

Proceedings of the 44th annual Southeast regional conference
A parallel multi-scale region outlier mining algorithm for meteorological data

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
High performance computing for spatial outliers detection using parallel wavelet transform

Intelligent Data Analysis
CURIO: a fast outlier and outlier cluster detection algorithm for large datasets

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Outlier Detection with Kernel Density Functions

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Finding anomalous periodic time series

Machine Learning
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
SubCOID: an attempt to explore cluster-outlier iterative detection approach to multi-dimensional data analysis in subspace

Proceedings of the 46th Annual Southeast Regional Conference on XX
A hybrid approach to outlier detection in the offset lithographic printing process

Engineering Applications of Artificial Intelligence
Parallel wavelet transform for spatio-temporal outlier detection in large meteorological data

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Outlier detection techniques for process mining applications

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Towards improving subspace data analysis

Proceedings of the 48th Annual Southeast Regional Conference
Atypicity detection in data streams: A self-adjusting approach

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Finding key knowledge attribute subspace of outliers in high-dimensional dataset

Expert Systems with Applications: An International Journal
A fast greedy algorithm for outlier mining

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Efficient mining of emerging events in a dynamic spatiotemporal environment

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Incremental connectivity-based outlier factor algorithm

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Detection of variable length anomalous subsequences in data streams

International Journal of Intelligent Information and Database Systems
Editorial: Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction

Data & Knowledge Engineering
Event-based classification of social media streams

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Review: A review of novelty detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the rare instances or the outliers is important in many KDD (knowledge discovery and data-mining) applications, such as detecting credit card fraud or finding irregularities in gene expressions. Signal-processing techniques have been introduced to transform images for enhancement, filtering, restoration, analysis, and reconstruction. In this paper, we present a new method in which we apply signal-processing techniques to solve important problems in data mining. In particular, we introduce a novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform. The main idea in FindOut is to remove the clusters from the original data and then identify the outliers. Although previous research showed that such techniques may not be effective because of the nature of the clustering, FindOut can successfully identify outliers from large datasets. Experimental results on very large datasets are presented which show the efficiency and effectiveness of the proposed approach.