Parameter-free anomaly detection for categorical data

Authors:
Shu Wu;Shengrui Wang
Affiliations:
Department of Computer Science, University of Sherbrooke, Quebec, Canada;Department of Computer Science, University of Sherbrooke, Quebec, Canada
Venue:
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Year:
2011

Citing 14
Cited 0

Elements of information theory

Elements of information theory
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Entropy-based criterion in categorical clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Data Mining and Knowledge Discovery
Mining Distance-Based Outliers from Categorical Data

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A Scalable and Efficient Outlier Detection Strategy for Categorical Data

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
CoCo: coding cost for parameter-free outlier detection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A LRT framework for fast spatial anomaly detection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Information theoretical analysis of multivariate correlation

IBM Journal of Research and Development
HOT: hypergraph-based outlier test for categorical data

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Establishing fraud detection patterns based on signatures

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection can usually be considered as a preprocessing step for locating, from a data set, the objects that do not conform to well defined notions of expected behaviors. It is a major issue of data mining for discovering novel or rare events, actions and phenomena. We investigate outlier detection from a categorical data set. The problem is especially challenging because of difficulty in defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and formulize outlier detection as an optimization problem. To solve the optimization problem, we design a practical and parameter-free method, named ITB. Experimental results show that the ITB method is much more effective and efficient than existing mainstream methods.