A ranking-based algorithm for detection of outliers in categorical data

  • Authors:
  • N.N.R. Ranga Suri;M. Narasimha Murty;G. Athithan

  • Affiliations:
  • Centre for Artificial Intelligence and Robotics CAIR, C V Raman Nagar, Bangalore, India;Department of CSA, Indian Institute of Science IISc, Bangalore, India;Centre for Artificial Intelligence and Robotics CAIR, C V Raman Nagar, Bangalore, India and Presently working at Scientific Analysis Group SAG, Delhi, India

  • Venue:
  • International Journal of Hybrid Intelligent Systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Outlier detection being an important data mining problem has attracted a lot of research interest in the recent past. As a result, various methods for outlier detection have been developed particularly for dealing with numerical data, whereas categorical data needs some attention. Addressing this requirement, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.