Projected outlier detection in high-dimensional mixed-attributes data set

Authors:
Mao Ye;Xue Li;Maria E. Orlowska
Affiliations:
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Information Technology and Electronic Engineering, The University of Queensland, Brisbane, Qld 4072, Australia;Polish-Japanese Institute of Information Technology, Faculty of Information Technology, Ul. Koszykowa 86, 02-008 Warsaw, Poland
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 17
Cited 4

Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Applications of Data Mining in Computer Security

Applications of Data Mining in Computer Security
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
ADMIT: anomaly-based data mining for intrusions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Projective Clustering by Histograms

IEEE Transactions on Knowledge and Data Engineering
An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Data Mining and Knowledge Discovery
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
HOT: hypergraph-based outlier test for categorical data

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Finding key attribute subset in dataset for outlier detection

Knowledge-Based Systems
An unbiased distance-based outlier detection approach for high-dimensional data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Robust data clustering by learning multi-metric Lq-norm distances

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Detecting outlier efficiently is an active research issue in data mining, which has important applications in the field of fraud detection, network intrusion detection, monitoring criminal activities in electronic commerce, etc. Because of the sparsity of high dimensional data, it is reasonable and meaningful to detect the outliers in suitable projected subspaces. We call such subspace and outliers in the subspace as anomaly subspace and projected outlier respectively. Many efficient algorithms have already been proposed for outlier detection based on different approaches, but there are few literatures on projected outlier detection for high dimensional data sets with mixed continuous and categorical attributes. In this paper, a novel projected outlier detection algorithm is proposed to detect projected outliers in high-dimensional mixed attribute data set. Our main contributions are: (1) combined with information entropy, a novel measure of anomaly subspace is proposed. In this anomaly subspace, meaningful outliers could be detected and explained. Unlike the previous projected outlier detection methods, the dimension of anomaly subspace is not decided beforehand; (2) theoretical analysis about this measure is presented; (3) bottom-up method is proposed to find the interesting anomaly subspaces; (4) the outlying degree of projected outlier is defined, which has good explanations; (5) the data set with mixed data type is handled; (6) experiments on synthetic and real data sets to evaluate the effectiveness of our approach are performed.