Statistical outlier detection using direct density ratio estimation

Authors:
Shohei Hido;Yuta Tsuboi;Hisashi Kashima;Masashi Sugiyama;Takafumi Kanamori
Affiliations:
IBM Research - Tokyo, Kanagawa, Japan and Graduate School of Informatics, Kyoto University, Department of Systems Science, Kyoto, Japan;IBM Research - Tokyo, Kanagawa, Japan;IBM Research - Tokyo, Kanagawa, Japan and The University of Tokyo, Department of Mathematical Informatics, Graduate School of Information Science and Technology, Tokyo, Japan;Tokyo Institute of Technology, Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo, Japan and PRESTO, Japan Science and Technology Agency, Kawaguchi, Japa ...;Nagoya University, Department of Computer Science and Mathematical Informatics, Graduate School of Information Science, Nagoya, Japan
Venue:
Knowledge and Information Systems
Year:
2011

Citing 0
Cited 7

Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search

Neural Networks
Flexible sample selection strategies for transfer learning in ranking

Information Processing and Management: an International Journal
Computational complexity of kernel-based density-ratio estimation: a condition number analysis

Machine Learning
On-line bayesian context change detection in web service systems

Proceedings of the 2013 international workshop on Hot topics in cloud services
Change-point detection in time-series data by relative density-ratio estimation

Neural Networks
Review: A review of novelty detection

Signal Processing
A ranking-based algorithm for detection of outliers in categorical data

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score. This approach is expected to have better performance even in high-dimensional problems since methods for directly estimating the density ratio without going through density estimation are available. Among various density ratio estimation methods, we employ the method called unconstrained least-squares importance fitting (uLSIF) since it is equipped with natural cross-validation procedures, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. Furthermore, uLSIF offers a closed-form solution as well as a closed-form formula for the leave-one-out error, so it is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.