On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms

Authors:
Kenji Yamanishi;Jun-Ichi Takeuchi;Graham Williams;Peter Milne
Affiliations:
Internet Systems Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae, Kawasaki, Kanagawa 216-8555, Japan. k-yamanishi@cw.jp.nec.com;Internet Systems Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae, Kawasaki, Kanagawa 216-8555, Japan. tak@ap.jp.nec.com;CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra ACT 2601, Australia. Graham.Williams@cmis.csiro.au;CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra ACT 2601, Australia. Peter.Milne@cmis.csiro.au
Venue:
Data Mining and Knowledge Discovery
Year:
2004

Citing 11
Cited 38

Elements of information theory

Elements of information theory
Event detection from time series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Activity monitoring: noticing interesting changes in behavior

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining in a data-flow environment: experience in network intrusion detection

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A classification-based methodology for planning audit strategies in fraud detection

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of fraud rules for telecommunications—challenges and solutions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures

Statistics and Computing
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Mining the Knowledge Mine: The Hot Spots Methodology for Mining Large Real World Databases

AI '97 Proceedings of the 10th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence

Mining traffic data from probe-car system for travel time prediction

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A delivery framework for health data mining and analytics

ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
A Unifying Framework for Detecting Outliers and Change Points from Time Series

IEEE Transactions on Knowledge and Data Engineering
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adaptive anomaly detection with evolving connectionist systems

Journal of Network and Computer Applications - Special issue: Network and information security: A computational intelligence approach
Conditional Anomaly Detection

IEEE Transactions on Knowledge and Data Engineering
Machine learning approaches to network anomaly detection

SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
Uses of artificial intelligence in the Brazilian customs fraud detection system

dg.o '08 Proceedings of the 2008 international conference on Digital government research
General support vector representation machine for one-class classification of non-stationary classes

Pattern Recognition
Context-sensitive queries for image retrieval in digital libraries

Journal of Intelligent Information Systems
ULISSE, a network intrusion detection system

Proceedings of the 4th annual workshop on Cyber security and information intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead
Outlier detection and evaluation by network flow

International Journal of Computer Applications in Technology
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Reducing false positives in anomaly detectors through fuzzy alert aggregation

Information Fusion
Attribute-value specification in customs fraud detection: a human-aided approach

Proceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government
Incremental one-class learning with bounded computational complexity

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Detecting outliers on arbitrary data streams using anytime approaches

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
Support system for thinking new criteria of unclassified diseases

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
A hybrid fraud scoring and spike detection technique in streaming data

Intelligent Data Analysis
The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature

Decision Support Systems
Detecting fraud in online games of chance and lotteries

Expert Systems with Applications: An International Journal
Rapid detection of rare geospatial events: earthquake warning applications

Proceedings of the 5th ACM international conference on Distributed event-based system
Online heterogeneous mixture modeling with marginal and copula selection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A knowledge mining framework for business analysts

ACM SIGMIS Database
A scoring model to detect abusive billing patterns in health insurance claims

Expert Systems with Applications: An International Journal
A prescription fraud detection model

Computer Methods and Programs in Biomedicine
A fuzzy index for detecting spatiotemporal outliers

Geoinformatica
Techniques for knowledge acquisition in dynamically changing environments

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special section on formal methods in pervasive computing, pervasive adaptation, and self-adaptive systems: Models and algorithms
Early diagnosis service for latent patients of incurable diseases

JSAI-isAI'10 Proceedings of the 2010 international conference on New Frontiers in Artificial Intelligence
Data transformation and query management in personal health sensor networks

Journal of Network and Computer Applications
Sequential change-point detection based on direct density-ratio estimation

Statistical Analysis and Data Mining
AnyOut: anytime outlier detection on streaming data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Topic evolution prediction of user generated contents considering enterprise generated contents

Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research
Identifying anomalous social contexts from mobile proximity data using binomial mixture models

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Quantum speed-up for unsupervised learning

Machine Learning
Prescriber-consumer social network analysis for risk level re-estimation based on an asymmetrical rating exchange model

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Review: A review of novelty detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is a fundamental issue in data mining, specifically in fraud detection, network intrusion detection, network monitoring, etc. SmartSifter is an outlier detection engine addressing this problem from the viewpoint of statistical learning theory. This paper provides a theoretical basis for SmartSifter and empirically demonstrates its effectiveness. SmartSifter detects outliers in an on-line process through the on-line unsupervised learning of a probabilistic model (using a finite mixture model) of the information source. Each time a datum is input SmartSifter employs an on-line discounting learning algorithm to learn the probabilistic model. A score is given to the datum based on the learned model with a high score indicating a high possibility of being a statistical outlier. The novel features of SmartSifter are: (1) it is adaptive to non-stationary sources of data; (2) a score has a clear statistical/information-theoretic meaning; (3) it is computationally inexpensive; and (4) it can handle both categorical and continuous variables. An experimental application to network intrusion detection shows that SmartSifter was able to identify data with high scores that corresponded to attacks, with low computational costs. Further experimental application has identified a number of meaningful rare cases in actual health insurance pathology data from Australia's Health Insurance Commission.