Mining Outliers with Adaptive Cutoff Update and Space Utilization (RACAS)

Authors:
Chi-Cheong Szeto;Edward Hung
Affiliations:
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, email: csccszeto@comp.polyu.edu.hk;Department of Computing, The Hong Kong Polytechnic University, Hong Kong, email: csehung@comp.polyu.edu.hk
Venue:
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Year:
2010

Citing 9
Cited 0

Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast mining of distance-based outliers in high-dimensional datasets

Data Mining and Knowledge Discovery
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining Outliers with Faster Cutoff Update and Space Utilization

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Mining outliers with faster cutoff update and space utilization

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently the efficiency of an outlier detection algorithm ORCA was improved by RCS (Randomization with faster Cutoff update and Space utilization after pruning), which changes the frequencies of updating the cutoff value and reclaiming the memory space at some pre-specified time. How and when to change the frequencies were only determined empirically. However, the optimal setting may vary for different data sets and computers with different CPU and disk I/O performance. In this paper, we theoretically formulate two methods to further reduce the execution time of RCS by dynamically adapting the frequencies at each step to different data sets and computers with different CPU and disk I/O performance. We conducted experiments on a KDD CUP real data set from a network intrusion detection problem under different conditions. The results show that our substantial time-saving from optimized ORCA is up to five times that of RCS and increases with the relative disk I/O cost, the percentage of outliers to find and the data set size.