A two-armed bandit collective for examplar based mining of frequent itemsets with applications to intrusion detection

  • Authors:
  • Vegard Haugland;Marius Kjølleberg;Svein-Erik Larsen;Ole-Christoffer Granmo

  • Affiliations:
  • University of Agder, Grimstad, Norway;University of Agder, Grimstad, Norway;University of Agder, Grimstad, Norway;University of Agder, Grimstad, Norway

  • Venue:
  • ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the last decades, frequent itemset mining has become a major area of research, with applications including indexing and similarity search, as well as mining of data streams, web, and software bugs. Although several efficient techniques for generating frequent itemsets with a minimum support (frequency) have been proposed, the number of itemsets produced is in many cases too large for effective usage in real-life applications. Indeed, the problem of deriving frequent itemsets that are both compact and of high quality, remains to a large degree open. In this paper we address the above problem by posing frequent itemset mining as a collection of interrelated two-armed bandit problems. In brief, we seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players aims to learn which items should be included in the frequent itemset. A novel reinforcement scheme allows the bandit players to learn this in a decentralized and on-line manner by observing one itemset at a time. Since each bandit player learns simply by updating the state of a finite automaton, and since the reinforcement feedback is calculated purely from the present itemset and the corresponding decisions of the bandit players, the resulting memory footprint is minimal. Furthermore, computational complexity grows merely linearly with the cardinality of the examplar itemset. The proposed scheme is extensively evaluated using both artificial data as well as data from a real-world network intrusion detection application. The results are conclusive, demonstrating an excellent ability to find frequent itemsets at various level of support. Furthermore, the sets of frequent itemsets produced for network instrusion detection are compact, yet accurately describe the different types of network traffic present.