Computational geometry: an introduction
Computational geometry: an introduction
Robust regression and outlier detection
Robust regression and outlier detection
A probabilistic resource allocating network for novelty detection
Neural Computation
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Outlier Detection Using Replicator Neural Networks
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Support Vector Data Description
Machine Learning
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Outlier Mining in Large High-Dimensional Data Sets
IEEE Transactions on Knowledge and Data Engineering
Tight upper bounds on the number of candidate patterns
ACM Transactions on Database Systems (TODS)
Toward Unsupervised Correlation Preserving Discretization
IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Fast Distributed Outlier Detection in Mixed-Attribute Data Sets
Data Mining and Knowledge Discovery
In-Network Outlier Detection in Wireless Sensor Networks
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Finding centric local outliers in categorical/numerical spaces
Knowledge and Information Systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Scalable and Efficient Outlier Detection Strategy for Categorical Data
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Unsupervised discretization using kernel density estimation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A fast greedy algorithm for outlier mining
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A survey on condensed representations for frequent sets
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
A distributed approach to detect outliers in very large data sets
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Detecting fraud in online games of chance and lotteries
Expert Systems with Applications: An International Journal
Anomaly detection in large-scale data stream networks
Data Mining and Knowledge Discovery
A scatter method for data and variable importance evaluation
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
Outlier detection has attracted substantial attention in many applications and research areas; some of the most prominent applications are network intrusion detection or credit card fraud detection. Many of the existing approaches are based on calculating distances among the points in the dataset. These approaches cannot easily adapt to current datasets that usually contain a mix of categorical and continuous attributes, and may be distributed among different geographical locations. In addition, current datasets usually have a large number of dimensions. These datasets tend to be sparse, and traditional concepts such as Euclidean distance or nearest neighbor become unsuitable. We propose a fast distributed outlier detection strategy intended for datasets containing mixed attributes. The proposed method takes into consideration the sparseness of the dataset, and is experimentally shown to be highly scalable with the number of points and the number of attributes in the dataset. Experimental results show that the proposed outlier detection method compares very favorably with other state-of-the art outlier detection strategies proposed in the literature and that the speedup achieved by its distributed version is very close to linear.