Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Authors:
Hui Cao;Gangquan Si;Yanbin Zhang;Lixin Jia
Affiliations:
School of Electrical Engineering, Xi'an Jiao Tong University, Xi'an, Shaanxi 710049, China;School of Electrical Engineering, Xi'an Jiao Tong University, Xi'an, Shaanxi 710049, China;School of Electrical Engineering, Xi'an Jiao Tong University, Xi'an, Shaanxi 710049, China;School of Electrical Engineering, Xi'an Jiao Tong University, Xi'an, Shaanxi 710049, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 33
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Two-phase clustering process for outliers detection

Pattern Recognition Letters
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval

Modern Information Retrieval
Findout: finding outliers in very large datasets

Knowledge and Information Systems
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Discovering cluster-based local outliers

Pattern Recognition Letters
Detecting pattern-based outliers

Pattern Recognition Letters
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Towards Exploring Interactive Relationship between Clusters and Outliers in Multi-Dimensional Data Analysis

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improving Mining of Medical Data by Outliers Prediction

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Distance-Based Detection and Prediction of Outliers

IEEE Transactions on Knowledge and Data Engineering
Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Data Mining and Knowledge Discovery
Capabilities of outlier detection schemes in large datasets, framework and methodologies

Knowledge and Information Systems
From outliers to prototypes: Ordering data

Neurocomputing
LDBOD: A novel local distribution based outlier detector

Pattern Recognition Letters
Fast mining of distance-based outliers in high-dimensional datasets

Data Mining and Knowledge Discovery
Outlier identification and market segmentation using kernel-based clustering techniques

Expert Systems with Applications: An International Journal
Some issues about outlier detection in rough set theory

Expert Systems with Applications: An International Journal
Projected outlier detection in high-dimensional mixed-attributes data set

Expert Systems with Applications: An International Journal
Inlier-Based Outlier Detection via Direct Density Ratio Estimation

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Cell-based outlier detection algorithm: a fast outlier detection algorithm for large datasets

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A nonparametric outlier detection for effectively discovering top-n outliers from engineering data

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Algorithms for detecting outliers via clustering and ranks

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence

Quantified Score

Hi-index	12.05

Visualization

Abstract

This paper proposes a density-similarity-neighbor-based outlier mining algorithm for the data preprocess of data mining technique. First, the concept of k-density of an object is presented and the similar density series (SDS) of the object is established based on the changes of the k-density and the neighbors k-densities of the object. Second, the average series cost (ASC) of the object is obtained based on the weighted sum of the distance between the two adjacent objects in SDS of the object. Finally, the density-similarity-neighbor-based outlier factor (DSNOF) of the object is calculated by using both the ASC of the object and the ASC of k-distance neighbors of the object, and the degree of the object being an outlier is indicated by the DSNOF. The experiments are performed on synthetic and real datasets to evaluate the effectiveness and the performance of the proposed algorithm. The experiments results verify that the proposed algorithm has higher quality of outlier mining and do not increase the algorithm complexity.