Distance-based outliers: algorithms and applications

Authors:
Edwin M. Knorr;Raymond T. Ng;Vladimir Tucakov
Affiliations:
Department of Computer Science, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada;Department of Computer Science, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada;Point Grey Research Inc., 101 - 1847 West Broadway, Vancouver, BC, V6J 1Y6, Canada
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2000

Citing 17
Cited 118

Computational geometry: an introduction

Computational geometry: an introduction
Applied multivariate statistical analysis

Applied multivariate statistical analysis
The design and analysis of spatial data structures

The design and analysis of spatial data structures
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Computing depth contours of bivariate point clouds

Computational Statistics & Data Analysis - Special issue on classification
On the analysis of indexing schemes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multidimensional binary search trees used for associative searching

Communications of the ACM
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Contour Tracking by Stochastic Propagation of Conditional Density

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume I - Volume I
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Mining Surprising Patterns Using Temporal Description Length

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
3-D model-based tracking of humans in action: a multi-view approach

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Background Modeling for Segmentation of Video-Rate Stereo Sequences

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Outlier Detection Using Replicator Neural Networks

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Data Squashing for Speeding Up Boosting-Based Outlier Detection

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Improving Classification by Removing or Relabeling Mislabeled Instances

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
An Adaptive Recommendation System without Explicit Acquisition of User Relevance Feedback

Distributed and Parallel Databases
Outlier Detection Algorithms in Data Mining Systems

Programming and Computing Software
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Detecting Interesting Exceptions from Medical Test Data with Visual Summarization

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Novelty detection: a review—part 1: statistical approaches

Signal Processing
Detecting pattern-based outliers

Pattern Recognition Letters
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Classification and knowledge discovery in protein databases

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Detection and prediction of distance-based outliers

Proceedings of the 2005 ACM symposium on Applied computing
A rank-by-feature framework for interactive exploration of multidimensional data

Information Visualization
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Parallel Algorithms for Distance-Based and Density-Based Outliers

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Distance-Based Detection and Prediction of Outliers

IEEE Transactions on Knowledge and Data Engineering
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Data Mining and Knowledge Discovery
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
Semi-supervised outlier detection

Proceedings of the 2006 ACM symposium on Applied computing
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting outliers in interval data

Proceedings of the 44th annual Southeast regional conference
Conditional Anomaly Detection

IEEE Transactions on Knowledge and Data Engineering
A trend pattern assessment approach to microarray gene expression profiling data analysis

Pattern Recognition Letters
From outliers to prototypes: Ordering data

Neurocomputing
Statistical change detection for multi-dimensional data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A trimmed mean approach to finding spatial outliers

Intelligent Data Analysis
Outlier detection by logic programming

ACM Transactions on Computational Logic (TOCL)
Visualization-informed noise elimination and its application in processing high-spatial-resolution remote sensing imagery

Computers & Geosciences
Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A Bayesian method for guessing the extreme values in a data set?

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Managing discoveries in the visual analytics process

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Local anomaly detection for mobile network monitoring

Information Sciences: an International Journal
Outlier Detection Based on the Distribution of Distances between Data Points

Informatica
Outlier detection using default reasoning

Artificial Intelligence
Outlier Detection: An Approximate Reasoning Approach

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Detecting Current Outliers: Continuous Outlier Detection over Time-Series Data Streams

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Efficiently finding unusual shapes in large image databases

Data Mining and Knowledge Discovery
Quality-driven information filtering using the WIQA policy framework

Web Semantics: Science, Services and Agents on the World Wide Web
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Some issues about outlier detection in rough set theory

Expert Systems with Applications: An International Journal
Detecting outlying properties of exceptional objects

ACM Transactions on Database Systems (TODS)
Domain independent data discrepancy detection using ensemble learning

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Hiding distinguished ones into crowd: privacy-preserving publishing data with outliers

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Minimum spanning tree based one-class classifier

Neurocomputing
Guessing the extreme values in a data set: a Bayesian method and its applications

The VLDB Journal — The International Journal on Very Large Data Bases
Improving authentication accuracy using artificial rhythms and cues for keystroke dynamics-based authentication

Expert Systems with Applications: An International Journal
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Efficient anomaly monitoring over moving object trajectory streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
What Can Formal Concept Analysis Do for Data Warehouses?

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
A hybrid novelty score and its use in keystroke dynamics-based user authentication

Pattern Recognition
Mining in Large Noisy Domains

Journal of Data and Information Quality (JDIQ)
Anomaly detection and spatio-temporal analysis of global climate system

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
Detection of Database Intrusion Using a Two-Stage Fuzzy System

ISC '09 Proceedings of the 12th International Conference on Information Security
RE2-CD: Robust and Energy Efficient Cut Detection in Wireless Sensor Networks

WASA '09 Proceedings of the 4th International Conference on Wireless Algorithms, Systems, and Applications
K-means clustering versus validation measures: a data-distribution perspective

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Multi-scale temporal segmentation and outlier detection in sensor networks

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Outlier mining based Automatic Incident Detection on urban arterial road

Mobility '09 Proceedings of the 6th International Conference on Mobile Technology, Application & Systems
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Data Mining and Knowledge Discovery
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
TOD: Temporal outlier detection by using quasi-functional temporal dependencies

Data & Knowledge Engineering
Correlation-based detection of attribute outliers

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
An efficient histogram method for outlier detection

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A new algorithm for high-dimensional outlier detection based on constrained particle swarm intelligence

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Semi-supervised outlier detection based on fuzzy rough C-means clustering

Mathematics and Computers in Simulation
An information entropy-based approach to outlier detection in rough sets

Expert Systems with Applications: An International Journal
Cluster-based congestion outlier detection method on trajectory data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Ensembles of pre-processing techniques for noise detection in gene expression data

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
A resistant learning procedure for coping with outliers

Annals of Mathematics and Artificial Intelligence
Detecting outliers on arbitrary data streams using anytime approaches

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
On community outliers and their efficient detection in information networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection in transactional data

Intelligent Data Analysis
Neighborhood outlier detection

Expert Systems with Applications: An International Journal
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Fuzzy clustering-based approach for outlier detection

ACE'10 Proceedings of the 9th WSEAS international conference on Applications of computer engineering
New outlier detection method based on fuzzy clustering

WSEAS Transactions on Information Science and Applications
Inter-image outliers and their application to image classification

Pattern Recognition
Soft fuzzy rough sets for robust feature evaluation and selection

Information Sciences: an International Journal
Towards robustness and energy efficiency of cut detection in wireless sensor networks

Ad Hoc Networks
Outlier detection by example

Journal of Intelligent Information Systems
Outlier detection and visualization of large datasets

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Detecting outlier sections in us congressional legislation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Robust fuzzy rough classifiers

Fuzzy Sets and Systems
iBAT: detecting anomalous taxi trajectories from GPS traces

Proceedings of the 13th international conference on Ubiquitous computing
A hybrid approach to outlier detection based on boundary region

Pattern Recognition Letters
A novel outlier detection method for spatio-tempral trajectory data

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Anomaly detection in information streams without prior domain knowledge

IBM Journal of Research and Development
A fast greedy algorithm for outlier mining

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Outlier detection in relational data: A case study in geographical information systems

Expert Systems with Applications: An International Journal
Simple instance selection for bankruptcy prediction

Knowledge-Based Systems
Outlier detection using rough set theory

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Similarity kernels for nearest neighbor-based outlier detection

IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis
SMART: Stream Monitoring enterprise Activities by RFID Tags

Information Sciences: an International Journal
A cross datasets referring outlier detection model applied to suspicious financial transaction discrimination

WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Improving authentication accuracy of unfamiliar passwords with pauses and cues for keystroke dynamics-based authentication

WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Development and application of tender evaluation decision-making and risk early warning system for water projects based on KDD

Advances in Engineering Software
Distance-Based outlier detection on uncertain data of gaussian distribution

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
AnyOut: anytime outlier detection on streaming data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Editorial: Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction

Data & Knowledge Engineering
Integrating community matching and outlier detection for mining evolutionary community outliers

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised ensemble learning for mining top-n outliers

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A minimum spanning tree-inspired clustering-based outlier detection technique

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Experimental comparison of DWT and DFT for trajectory representation

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Towards intensional answers to OLAP queries for analytical sessions

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Outlier detection using centrality and center-proximity

Proceedings of the 21st ACM international conference on Information and knowledge management
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Approximate document outlier detection using random spectral projection

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Two-stage database intrusion detection by combining multiple evidence and belief update

Information Systems Frontiers
Combining co-clustering with noise detection for theme-based summarization

ACM Transactions on Speech and Language Processing (TSLP)
Fast top-k distance-based outlier detection on uncertain data

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
SVOIS: Support Vector Oriented Instance Selection for text classification

Information Systems
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. Existing methods that we have seen for finding outliers can only deal efficiently with two dimensions/attributes of a dataset. In this paper, we study the notion of DB (distance-based) outliers. Specifically, we show that (i) outlier detection can be done efficiently for large datasets, and for k-dimensional datasets with large values of k (e.g., $k \ge 5$); and (ii), outlier detection is a meaningful and important knowledge discovery task.First, we present two simple algorithms, both having a complexity of $O(k \: N^2)$, k being the dimensionality and N being the number of objects in the dataset. These algorithms readily support datasets with many more than two attributes. Second, we present an optimized cell-based algorithm that has a complexity that is linear with respect to N, but exponential with respect to k. We provide experimental results indicating that this algorithm significantly outperforms the two simple algorithms for $k \leq 4$. Third, for datasets that are mainly disk-resident, we present another version of the cell-based algorithm that guarantees at most three passes over a dataset. Again, experimental results show that this algorithm is by far the best for $k \leq 4$. Finally, we discuss our work on three real-life applications, including one on spatio-temporal data (e.g., a video surveillance application), in order to confirm the relevance and broad applicability of DB outliers.