Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Authors:
Matthew Eric Otey;Amol Ghoting;Srinivasan Parthasarathy
Affiliations:
Department of Computer Science and Engineering,, The Ohio State University, Columbus, USA 43210;Department of Computer Science and Engineering,, The Ohio State University, Columbus, USA 43210;Department of Computer Science and Engineering,, The Ohio State University, Columbus, USA 43210
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 17
Cited 16

Algorithms for clustering data

Algorithms for clustering data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Intrusion detection in wireless ad-hoc networks

MobiCom '00 Proceedings of the 6th annual international conference on Mobile computing and networking
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Synthesizing High-Frequency Rules from Different Data Sources

IEEE Transactions on Knowledge and Data Engineering
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Learning nonstationary models of normal network traffic for detecting novel attacks

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ADMIT: anomaly-based data mining for intrusions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Intrusion detection techniques for mobile wireless networks

Wireless Networks
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards NIC-based intrusion detection

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed deviation detection in sensor networks

ACM SIGMOD Record
A cooperative intrusion detection system for ad hoc networks

Proceedings of the 1st ACM workshop on Security of ad hoc and sensor networks
LOADED: Link-Based Outlier and Anomaly Detection in Evolving Data Sets

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

Mining Multiple Data Sources: Local Pattern Analysis

Data Mining and Knowledge Discovery
Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Projected outlier detection in high-dimensional mixed-attributes data set

Expert Systems with Applications: An International Journal
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Adaptive Distributed Intrusion Detection Using Parametric Model

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Data Mining and Knowledge Discovery
Editorial: New fuzzy c-means clustering model based on the data weighted approach

Data & Knowledge Engineering
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
A distributed approach to detect outliers in very large data sets

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
On detecting clustered anomalies using SCiForest

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Algorithms for speeding up distance-based outlier detection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Parameter-free anomaly detection for categorical data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Fast anomaly detection for streaming data

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
In-network approximate computation of outliers with quality guarantees

Information Systems
Review: A review of novelty detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.