HOT: hypergraph-based outlier test for categorical data

  • Authors:
  • Li Wei;Weining Qian;Aoying Zhou;Wen Jin;Jeffrey X. Yu

  • Affiliations:
  • Department of Computer Science and Engineering, Fudan University;Department of Computer Science and Engineering, Fudan University;Department of Computer Science and Engineering, Fudan University;Department of Computer Science, Simon Fraser University;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

As a widely used data mining technique, outlier detection is a process which aims at finding anomalies with good explanations. Most existing methods are designed for numeric data. They will have problems with real-life applications that contain categorical data. In this paper, we introduce a novel outlier mining method based on a hypergraph model. Since hypergraphs precisely capture the distribution characteristics in data subspaces, this method is effective in identifying anomalies in dense subspaces and presents good interpretations for the local outlierness. By selecting the most relevant subspaces, the problem of "curse of dimensionality" in very large databases can also be ameliorated. Furthermore, the connectivity property is used to replace the distance metrics, so that the distance-based computation is not needed anymore, which enhances the robustness for handling missing-value data. The fact, that connectivity computation facilitates the aggregation operations supported by most SQL-compatible database systems, makes the mining process much efficient. Finally, experiments and analysis show that our method can find outliers in categorical data with good performance and quality.