CURIO: a fast outlier and outlier cluster detection algorithm for large datasets

Authors:
Aaron Ceglar;John F. Roddick;David M. W. Powers
Affiliations:
Flinders University, Adelaide, South Australia;Flinders University, Adelaide, South Australia;Flinders University, Adelaide, South Australia
Venue:
AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Year:
2007

Citing 25
Cited 1

Robust regression and outlier detection

Robust regression and outlier detection
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
BACON: blocked adaptive computationally efficient outlier nominators

Computational Statistics & Data Analysis
Multidimensional binary search trees used for associative searching

Communications of the ACM
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting graph-based spatial outliers: algorithms and applications (a summary of results)

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Findout: finding outliers in very large datasets

Knowledge and Information Systems
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Outlier Detection Using Replicator Neural Networks

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
A unified taxonomic framework for information visualization

APVis '03 Proceedings of the Asia-Pacific symposium on Information visualisation - Volume 24
Information-Theoretic Measures for Anomaly Detection

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Outliers and data mining: finding exceptions in data

Outliers and data mining: finding exceptions in data
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploratory medical knowledge discovery: experiences and issues

ACM SIGKDD Explorations Newsletter
Novelty detection: a review—part 1: statistical approaches

Signal Processing
Novelty detection: a review—part 2: neural network based approaches

Signal Processing
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A System for the Analysis of Jet Engine Vibration Data

Integrated Computer-Aided Engineering

Mining Medical Administrative Data --The PKB Suite

Proceedings of the 2010 conference on Data Mining for Business Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier (or anomaly) detection is an important problem for many domains, including fraud detection, risk analysis, network intrusion and medical diagnosis, and the discovery of significant outliers is becoming an integral aspect of data mining. This paper presents CURIO, a novel algorithm that uses quantisation and implied distance metrics to provide a fast algorithm that is linear for the number of objects and only requires two sequential scans of disk resident datasets. CURIO includes a novel direct quantisation technique and the explicit discovery of outlier clusters. Moreover, a major attribute of CURIO is its simplicity and economy with respect to algorithm, memory footprint and data structures.