Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers from large databases in any metric space
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast mining of distance-based outliers in high-dimensional datasets
Data Mining and Knowledge Discovery
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining Outliers with Faster Cutoff Update and Space Utilization
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Mining outliers with faster cutoff update and space utilization
Pattern Recognition Letters
Hi-index | 0.00 |
Recently the efficiency of an outlier detection algorithm ORCA was improved by RCS (Randomization with faster Cutoff update and Space utilization after pruning), which changes the frequencies of updating the cutoff value and reclaiming the memory space at some pre-specified time. How and when to change the frequencies were only determined empirically. However, the optimal setting may vary for different data sets and computers with different CPU and disk I/O performance. In this paper, we theoretically formulate two methods to further reduce the execution time of RCS by dynamically adapting the frequencies at each step to different data sets and computers with different CPU and disk I/O performance. We conducted experiments on a KDD CUP real data set from a network intrusion detection problem under different conditions. The results show that our substantial time-saving from optimized ORCA is up to five times that of RCS and increases with the relative disk I/O cost, the percentage of outliers to find and the data set size.