Outlier detection for high dimensional data

Authors:
Charu C. Aggarwal;Philip S. Yu
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Year:
2001

Citing 18
Cited 153

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Re-designing distance functions and distance-based applications for high dimensional data

ACM SIGMOD Record
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An analysis of the behavior of a class of genetic adaptive systems.

An analysis of the behavior of a class of genetic adaptive systems.

FREM: fast and robust EM clustering for large data sets

Proceedings of the eleventh international conference on Information and knowledge management
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Outlier Detection Integrating Semantic Knowledge

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Discovering cluster-based local outliers

Pattern Recognition Letters
A survey on wavelet applications in data mining

ACM SIGKDD Explorations Newsletter
Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Monotonic On-Line Linear Algorithm for Hierarchical Agglomerative Classification

Information Technology and Management
Detecting pattern-based outliers

Pattern Recognition Letters
Group Nearest Neighbor Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Outlier analysis for gene expression data

Journal of Computer Science and Technology - Special issue on bioinformatics
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Locating secret messages in images

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
MORPHEUS: motif oriented representations to purge hostile events from unlabeled sequences

Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Classification and knowledge discovery in protein databases

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Aggregate nearest neighbor queries in spatial databases

ACM Transactions on Database Systems (TODS)
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Two ellipse-based pruning methods for group nearest neighbor queries

Proceedings of the 13th annual ACM international workshop on Geographic information systems
Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Example-Based Robust Outlier Detection in High Dimensional Datasets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
BORDER: Efficient Computation of Boundary Points

IEEE Transactions on Knowledge and Data Engineering
Relevance search and anomaly detection in bipartite graphs

ACM SIGKDD Explorations Newsletter
KDX: An Indexer for Support Vector Machines

IEEE Transactions on Knowledge and Data Engineering
An outlier-based data association method for linking criminal incidents

Decision Support Systems - Special issue: Intelligence and security informatics
SLOM: a new measure for local spatial outliers

Knowledge and Information Systems
Finding centric local outliers in categorical/numerical spaces

Knowledge and Information Systems
Projective clustering using itemset discovery for multi-dimensional data analysis

MS'06 Proceedings of the 17th IASTED international conference on Modelling and simulation
Detecting outliers in interval data

Proceedings of the 44th annual Southeast regional conference
An integration architecture for knowledge management systems and business process management systems

Computers in Industry
The pairwise attribute noise detection algorithm

Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Conditional Anomaly Detection

IEEE Transactions on Knowledge and Data Engineering
An overview of anomaly detection techniques: Existing solutions and latest technological trends

Computer Networks: The International Journal of Computer and Telecommunications Networking
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
A trimmed mean approach to finding spatial outliers

Intelligent Data Analysis
Outlier detection by logic programming

ACM Transactions on Computational Logic (TOCL)
Visualization-informed noise elimination and its application in processing high-spatial-resolution remote sensing imagery

Computers & Geosciences
Hos-Miner: a system for detecting outlyting subspaces of high-dimensional data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A genetic approach for efficient outlier detection in projected space

Pattern Recognition
Mining approximate top-k subspace anomalies in multi-dimensional time-series data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems

IEEE Transactions on Knowledge and Data Engineering
Conformity analysis with structured query language

AIKED'07 Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases - Volume 6
LDBOD: A novel local distribution based outlier detector

Pattern Recognition Letters
Robust detection of outliers for projection-based face recognition methods

Multimedia Tools and Applications
Sampling cube: a framework for statistical olap over sampling data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CURIO: a fast outlier and outlier cluster detection algorithm for large datasets

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection using default reasoning

Artificial Intelligence
Outlier Detection with Kernel Density Functions

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Outlier Detection: An Approximate Reasoning Approach

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
A Coding Hierarchy Computing Based Clustering Algorithm

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Efficient Bounds in Finding Aggregate Nearest Neighbors

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points

Pattern Recognition Letters
Outlier identification and market segmentation using kernel-based clustering techniques

Expert Systems with Applications: An International Journal
Outlier detection and evaluation by network flow

International Journal of Computer Applications in Technology
Outlier Detection Based on Granular Computing

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Some issues about outlier detection in rough set theory

Expert Systems with Applications: An International Journal
Finding anomalous periodic time series

Machine Learning
Detecting outlying properties of exceptional objects

ACM Transactions on Database Systems (TODS)
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Outlier detection based on rough sets theory

Intelligent Data Analysis
Subspace sums for extracting non-random data from massive noise

Knowledge and Information Systems
SubCOID: an attempt to explore cluster-outlier iterative detection approach to multi-dimensional data analysis in subspace

Proceedings of the 46th Annual Southeast Regional Conference on XX
Robust support vector machine training via convex outlier ablation

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The Needles-in-Haystack Problem

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
A Predictive Analysis on Medical Data Based on Outlier Detection Method Using Non-Reduct Computation

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Detecting Projected Outliers in High-Dimensional Data Streams

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Towards supporting expert evaluation of clustering results using a data mining process model

Information Sciences: an International Journal
An outlier-based data association method for linking criminal incidents

Decision Support Systems - Special issue: Intelligence and security informatics
Multivariate similarity-based conformity measure (MSCM): an outlier detection measure for data mining applications

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Interactive visual summarization of multidimensional data

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Data Mining and Knowledge Discovery
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
HOT: hypergraph-based outlier test for categorical data

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hyperclique pattern based off-topic detection

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Efficient difference NN queries for moving objects

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
TOD: Temporal outlier detection by using quasi-functional temporal dependencies

Data & Knowledge Engineering
Detecting outliers in categorical record databases based on attribute associations

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Efficient methods in finding aggregate nearest neighbor by projection-based filtering

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Detecting unusual pattern with labeled data in two-stage

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
An information entropy-based approach to outlier detection in rough sets

Expert Systems with Applications: An International Journal
Cluster-based congestion outlier detection method on trajectory data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Uniqueness mining

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Example-based robust DB-outlier detection for high dimensional data

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Mining Outliers in Correlated Subspaces for High Dimensional Data Sets

Fundamenta Informaticae - Intelligent Data Analysis in Granular Computing
Domain knowledge assimilation by learning complex concepts

Transactions on rough sets VIII
On the importance of data balancing for symbolic regression

IEEE Transactions on Evolutionary Computation
In-depth behavior understanding and use: The behavior informatics approach

Information Sciences: an International Journal
Neighborhood outlier detection

Expert Systems with Applications: An International Journal
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A fast randomized method for local density-based outlier detection in high dimensional data

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Specialty mining

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
On detecting clustered anomalies using SCiForest

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
SOREX: subspace outlier ranking exploration toolkit

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Towards improving subspace data analysis

Proceedings of the 48th Annual Southeast Regional Conference
Techniques for finding similarity knowledge in OLAP reports

Expert Systems with Applications: An International Journal
Atypicity detection in data streams: A self-adjusting approach

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Performance Analysis of Class Noise Detection Algorithms

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

The Journal of Machine Learning Research
Outlier detection by example

Journal of Intelligent Information Systems
Fast outlier detection for very large log data

Expert Systems with Applications: An International Journal
Finding key knowledge attribute subspace of outliers in high-dimensional dataset

Expert Systems with Applications: An International Journal
Anomaly-based network intrusion detection using outlier subspace analysis: a case study

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
A hybrid approach to outlier detection based on boundary region

Pattern Recognition Letters
A novel outlier detection method for spatio-tempral trajectory data

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Anomaly detection in information streams without prior domain knowledge

IBM Journal of Research and Development
Estimating accuracy of mobile-masquerader detection using worst-case and best-case scenario

ICICS'06 Proceedings of the 8th international conference on Information and Communications Security
Mining outliers in spatial networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A fast greedy algorithm for outlier mining

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A unified subspace outlier ensemble framework for outlier detection

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Outlier detection in relational data: A case study in geographical information systems

Expert Systems with Applications: An International Journal
Simple instance selection for bankruptcy prediction

Knowledge-Based Systems
Outlier detection based on rough membership function

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Domain knowledge assimilation by learning complex concepts

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
OddBall: spotting anomalies in weighted graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Collusion set detection through outlier discovery

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
SMART: Stream Monitoring enterprise Activities by RFID Tags

Information Sciences: an International Journal
Mining special features to improve the performance of e-commerce product selection and resume processing

International Journal of Computational Science and Engineering
Hunting for fraudsters in random forests

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Event-based classification of social media streams

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Integrating community matching and outlier detection for mining evolutionary community outliers

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
XML class outlier detection

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Exploring the power of outliers for cross-domain literature mining

Bisociative Knowledge Discovery
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Fast and reliable anomaly detection in categorical data

Proceedings of the 21st ACM international conference on Information and knowledge management
An evolutionary approach for high dimensional attribute selection

International Journal of Intelligent Information and Database Systems
AUDIO: an integrity auditing framework of outlier-mining-as-a-service systems

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Community trend outlier detection using soft temporal pattern mining

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
OutRules: a framework for outlier descriptions in multiple context spaces

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Enhancing density-based clustering: Parameter reduction and outlier detection

Information Systems
Fuzzy C-Means in High Dimensional Spaces

International Journal of Fuzzy System Applications
Interactive data mining with 3D-parallel-coordinate-trees

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter
Robust estimation of location and scatter by pruning the minimum spanning tree

Journal of Multivariate Analysis
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
One class random forests

Pattern Recognition
SVOIS: Support Vector Oriented Instance Selection for text classification

Information Systems
Causal inference with rare events in large-scale time-series data

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology

Knowledge-Based Systems
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Ensemble-based noise detection: noise ranking and visual performance evaluation

Data Mining and Knowledge Discovery
Exploiting domain knowledge to detect outliers

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.