Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences

  • Authors:
  • Jorge Silva;Rebecca Willett

  • Affiliations:
  • Duke University, Durham;Duke University, Durham

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.14

Visualization

Abstract

This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has $O(np)$ computational complexity, where $n$ is the number of training observations and $p$ is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where $p 75,000$, and it is shown that it can outperform other state-of-the-art methods.