A ranking-based algorithm for detection of outliers in categorical data

Authors:
N.N.R. Ranga Suri;M. Narasimha Murty;G. Athithan
Affiliations:
Centre for Artificial Intelligence and Robotics CAIR, C V Raman Nagar, Bangalore, India;Department of CSA, Indian Institute of Science IISc, Bangalore, India;Centre for Artificial Intelligence and Robotics CAIR, C V Raman Nagar, Bangalore, India and Presently working at Scientific Analysis Group SAG, Delhi, India
Venue:
International Journal of Hybrid Intelligent Systems
Year:
2014

Citing 24
Cited 0

Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovering cluster-based local outliers

Pattern Recognition Letters
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting anomalous records in categorical datasets

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A genetic approach for efficient outlier detection in projected space

Pattern Recognition
Mining Distance-Based Outliers from Categorical Data

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A Scalable and Efficient Outlier Detection Strategy for Categorical Data

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A new initialization method for categorical data clustering

Expert Systems with Applications: An International Journal
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
OutRank: ranking outliers in high dimensional data

ICDEW '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop
K-Distributions: A New Algorithm for Clustering Categorical Data

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Statistical outlier detection using direct density ratio estimation

Knowledge and Information Systems
A fast greedy algorithm for outlier mining

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Anomaly Detection for Discrete Sequences: A Survey

IEEE Transactions on Knowledge and Data Engineering
Information-Theoretic Outlier Detection for Large-Scale Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Authorship attribution as a case of anomaly detection: A neural network model

International Journal of Hybrid Intelligent Systems
A combined approach to tackle imbalanced data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection being an important data mining problem has attracted a lot of research interest in the recent past. As a result, various methods for outlier detection have been developed particularly for dealing with numerical data, whereas categorical data needs some attention. Addressing this requirement, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.