Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Authors:
Kenji Yamanishi;Jun-ichi Takeuchi
Affiliations:
NEC Corporation, 4-1-1,Miyazaki,Miyamae, Kawasaki,Kanagawa 216-8555,Japan;NEC Corporation, 4-1-1,Miyazaki,Miyamae, Kawasaki,Kanagawa 216-8555,Japan
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 13
Cited 22

A Learning Criterion for Stochastic Rules

Machine Learning - Computational learning theory
A classification-based methodology for planning audit strategies in fraud detection

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of fraud rules for telecommunications—challenges and solutions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Text classification using ESC-based stochastic decision lists

Proceedings of the eighth international conference on Information and knowledge management
Temporal sequence learning and data reduction for anomaly detection

ACM Transactions on Information and System Security (TISSEC)
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Learning Decision Lists

Machine Learning
Unsupervised Profiling for Identifying Superimposed Fraud

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
A decision-theoretic extension of stochastic complexity and its applications to learning

IEEE Transactions on Information Theory

Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Discovering cluster-based local outliers

Pattern Recognition Letters
A unifying framework for detecting outliers and change points from non-stationary time series data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
Detection and prediction of distance-based outliers

Proceedings of the 2005 ACM symposium on Applied computing
Distance-Based Detection and Prediction of Outliers

IEEE Transactions on Knowledge and Data Engineering
A Unifying Framework for Detecting Outliers and Change Points from Time Series

IEEE Transactions on Knowledge and Data Engineering
A clustering-based method for unsupervised intrusion detections

Pattern Recognition Letters
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
ULISSE, a network intrusion detection system

Proceedings of the 4th annual workshop on Cyber security and information intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead
Detecting Current Outliers: Continuous Outlier Detection over Time-Series Data Streams

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Minimizing False Positives of a Decision Tree Classifier for Intrusion Detection on the Internet

Journal of Network and Systems Management
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Outlier detection based on rough sets theory

Intelligent Data Analysis
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Synchronization based outlier detection

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
An interactive approach to outlier detection

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Outlier detection by example

Journal of Intelligent Information Systems
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Hunting for fraudsters in random forests

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.