Parallel Mining of Outliers in Large Database

Authors:
Edward Hung;David W. Cheung
Affiliations:
Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong, People's Republic of China. ehung@cs.umd.edu;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong, People's Republic of China. dcheung@csis.hku.hk
Venue:
Distributed and Parallel Databases
Year:
2002

Citing 10
Cited 6

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Knowledge Discovery in Databases: An Attribute-Oriented Approach

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A unified approach for mining outliers

CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
On Digital Money and Card Technologies

On Digital Money and Card Technologies

Parallel Algorithms for Distance-Based and Density-Based Outliers

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Disk aware discord discovery: finding unusual time series in terabyte sized datasets

Knowledge and Information Systems
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
A distributed approach to detect outliers in very large data sets

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Algorithms for speeding up distance-based outlier detection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating outlier detection with uncertain data using graphics processors

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database, previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm (Knorr and Ng, in Proc. 24th VLDB, 1998) is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset.A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization Parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. The results show that it is a very good choice to mine outliers in a cluster of workstations with a low-cost interconnected by a commodity communication network.