Fast Outlier Detection in High Dimensional Spaces

Authors:
Fabrizio Angiulli;Clara Pizzuti
Affiliations:
-;-
Venue:
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2002

Citing 12
Cited 59

Fractals for secondary key retrieval

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Outlier detection and localisation with wavelet based multifractal formalism

Outlier detection and localisation with wavelet based multifractal formalism
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Discovering cluster-based local outliers

Pattern Recognition Letters
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Framework for mining web content outliers

Proceedings of the 2004 ACM symposium on Applied computing
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Mining web content outliers using structure oriented weighting techniques and N-grams

Proceedings of the 2005 ACM symposium on Applied computing
Detection and prediction of distance-based outliers

Proceedings of the 2005 ACM symposium on Applied computing
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
Mining for misconfigured machines in grid systems

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
SLOM: a new measure for local spatial outliers

Knowledge and Information Systems
Condensed Nearest Neighbor Data Domain Description

IEEE Transactions on Pattern Analysis and Machine Intelligence
Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Fast mining of distance-based outliers in high-dimensional datasets

Data Mining and Knowledge Discovery
CURIO: a fast outlier and outlier cluster detection algorithm for large datasets

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection using default reasoning

Artificial Intelligence
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Finding anomalous periodic time series

Machine Learning
Detecting outlying properties of exceptional objects

ACM Transactions on Database Systems (TODS)
Hiding distinguished ones into crowd: privacy-preserving publishing data with outliers

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Mining in Large Noisy Domains

Journal of Data and Information Quality (JDIQ)
Enhanced supervised locally linear embedding

Pattern Recognition Letters
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
Detecting Projected Outliers in High-Dimensional Data Streams

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
LoOP: local outlier probabilities

Proceedings of the 18th ACM conference on Information and knowledge management
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
Cell-based outlier detection algorithm: a fast outlier detection algorithm for large datasets

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining outliers with faster cutoff update and space utilization

Pattern Recognition Letters
Neighborhood outlier detection

Expert Systems with Applications: An International Journal
Soft fuzzy rough sets for robust feature evaluation and selection

Information Sciences: an International Journal
Mining Outliers with Adaptive Cutoff Update and Space Utilization (RACAS)

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Distance-based outlier detection: consolidation and renewed bearing

Proceedings of the VLDB Endowment
Algorithms for speeding up distance-based outlier detection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust fuzzy rough classifiers

Fuzzy Sets and Systems
Temperature prediction in electric arc furnace with neural network tree

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Finding fraud in health insurance data with two-layer outlier detection approach

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
A hybrid approach to outlier detection based on boundary region

Pattern Recognition Letters
Neural network committees optimized with evolutionary methods for steel temperature control

ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part I
A novel outlier detection method for spatio-tempral trajectory data

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Simple instance selection for bankruptcy prediction

Knowledge-Based Systems
Visual evaluation of outlier detection models

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Outlier detection using rough set theory

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Condensed nearest neighbor data domain description

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Similarity kernels for nearest neighbor-based outlier detection

IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis
A minimum spanning tree-inspired clustering-based outlier detection technique

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Quantum speed-up for unsupervised learning

Machine Learning
Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing one-class support vector machines for unsupervised anomaly detection

Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
Combining co-clustering with noise detection for theme-based summarization

ACM Transactions on Speech and Language Processing (TSLP)
Clustering and outlier detection using isoperimetric number of trees

Pattern Recognition
Fast top-k distance-based outlier detection on uncertain data

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Review: A review of novelty detection

Signal Processing
Exploiting domain knowledge to detect outliers

Data Mining and Knowledge Discovery
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d + 1 scans of the data set with a low time complexity cost, where d is the number of dimensions of the data set. During each scan the number of points candidate to belong to the solution set is sensibly reduced. The second phase returns the exact solution by doing a single scan which examines further a little fraction of the data set. Experimental results show that the algorithm always finds the exact solution during the first phase after d 驴 d + 1 steps and it scales linearly both in the dimensionality and the size of the data set.