Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Authors:
Stephen D. Bay;Mark Schwabacher
Affiliations:
Institute for the Study of Learning and Expertise, Palo Alto, CA;NASA Ames Research Center, Moffet Field, CA
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 11
Cited 93

Temporal sequence learning and data reduction for anomaly detection

ACM Transactions on Information and System Security (TISSEC)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching

Communications of the ACM
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Relational Instance-Based Learning with Lists and Terms

Machine Learning
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases

Detection and prediction of distance-based outliers

Proceedings of the 2005 ACM symposium on Applied computing
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Parallel Algorithms for Distance-Based and Density-Based Outliers

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Distance-Based Detection and Prediction of Outliers

IEEE Transactions on Knowledge and Data Engineering
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
A characterization of data mining algorithms on a modern processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Data Mining and Knowledge Discovery
Detecting outliers using transduction and statistical testing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
SLOM: a new measure for local spatial outliers

Knowledge and Information Systems
Detecting outliers in interval data

Proceedings of the 44th annual Southeast regional conference
Problem diagnosis in large-scale computing environments

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
Outlier detection by logic programming

ACM Transactions on Computational Logic (TOCL)
Visualization-informed noise elimination and its application in processing high-spatial-resolution remote sensing imagery

Computers & Geosciences
Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A Bayesian method for guessing the extreme values in a data set?

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Fast mining of distance-based outliers in high-dimensional datasets

Data Mining and Knowledge Discovery
Data mining on the cell broadband engine

Proceedings of the 22nd annual international conference on Supercomputing
CURIO: a fast outlier and outlier cluster detection algorithm for large datasets

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection using default reasoning

Artificial Intelligence
Outlier Detection with Kernel Density Functions

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Efficiently finding unusual shapes in large image databases

Data Mining and Knowledge Discovery
Disk aware discord discovery: finding unusual time series in terabyte sized datasets

Knowledge and Information Systems
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Some issues about outlier detection in rough set theory

Expert Systems with Applications: An International Journal
Finding anomalous periodic time series

Machine Learning
A Kind of Algorithms for Euclidean Distance-Based Outlier Mining and its Application to Expressway Toll Fraud Detection

CAR '09 Proceedings of the 2009 International Asia Conference on Informatics in Control, Automation and Robotics
Hiding distinguished ones into crowd: privacy-preserving publishing data with outliers

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Guessing the extreme values in a data set: a Bayesian method and its applications

The VLDB Journal — The International Journal on Very Large Data Bases
Incremental outlier detection in data streams using local correlation integral

Proceedings of the 2009 ACM symposium on Applied Computing
Mining Outliers with Faster Cutoff Update and Space Utilization

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Efficient anomaly monitoring over moving object trajectory streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining in Large Noisy Domains

Journal of Data and Information Quality (JDIQ)
Anomaly detection and spatio-temporal analysis of global climate system

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
Efficient Pruning Schemes for Distance-Based Outlier Detection

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
FindWDO: a k-nearest neighbors approach for detecting Web document outliers

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Data Mining and Knowledge Discovery
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
An automatic feature generation approach to multiple instance learning and its applications to image databases

Multimedia Tools and Applications
An efficient histogram method for outlier detection

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficiently mining regional outliers in spatial data

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Cell-based outlier detection algorithm: a fast outlier detection algorithm for large datasets

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
TACO: tunable approximate computation of outliers in wireless sensor networks

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining outliers with faster cutoff update and space utilization

Pattern Recognition Letters
Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mass estimation and its applications

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Soft fuzzy rough sets for robust feature evaluation and selection

Information Sciences: an International Journal
PAO: power-efficient attribution of outliers in wireless sensor networks

Proceedings of the Seventh International Workshop on Data Management for Sensor Networks
Mining Outliers with Adaptive Cutoff Update and Space Utilization (RACAS)

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A fast randomized method for local density-based outlier detection in high dimensional data

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
A distributed approach to detect outliers in very large data sets

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
On detecting clustered anomalies using SCiForest

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
I/O conscious algorithm design and systems support for data analysis on emerging architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Distance-based outlier detection: consolidation and renewed bearing

Proceedings of the VLDB Endowment
NADO: network anomaly detection using outlier approach

Proceedings of the 2011 International Conference on Communication, Computing & Security
Finding key knowledge attribute subspace of outliers in high-dimensional dataset

Expert Systems with Applications: An International Journal
An unbiased distance-based outlier detection approach for high-dimensional data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Algorithms for speeding up distance-based outlier detection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust fuzzy rough classifiers

Fuzzy Sets and Systems
Anomaly detection using ensembles

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
A survey of outlier detection methodologies and their applications

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part I
A fast greedy algorithm for outlier mining

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Simple instance selection for bankruptcy prediction

Knowledge-Based Systems
Introduction to data mining for sustainability

Data Mining and Knowledge Discovery
High-dimensional shared nearest neighbor clustering algorithm

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Robust outlier detection using commute time and eigenspace embedding

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Similarity kernels for nearest neighbor-based outlier detection

IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis
Development and application of tender evaluation decision-making and risk early warning system for water projects based on KDD

Advances in Engineering Software
Fast anomaly detection for streaming data

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Density-based evolutionary outlier detection

Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A minimum spanning tree-inspired clustering-based outlier detection technique

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Continuous adaptive outlier detection on distributed data streams

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Mass estimation

Machine Learning
Cloud-enabled privacy-preserving collaborative learning for mobile sensing

Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems
Auditeur: a mobile-cloud service platform for acoustic event detection on smartphones

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining co-clustering with noise detection for theme-based summarization

ACM Transactions on Speech and Language Processing (TSLP)
A non-time series approach to vehicle related time series problems

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
In-network approximate computation of outliers with quality guarantees

Information Systems
Software health management: a necessity for safety critical systems

Innovations in Systems and Software Engineering
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Review: A review of novelty detection

Signal Processing
A multivariate fuzzy system applied for outliers detection

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.