Small subset queries and bloom filters using ternary associative memories, with applications

  • Authors:
  • Ashish Goel;Pankaj Gupta

  • Affiliations:
  • Stanford University, Stanford, CA, USA;Twitter, Inc., San Francisco, CA, USA

  • Venue:
  • Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Associative memories offer high levels of parallelism in matching a query against stored entries. We design and analyze an architecture which uses single lookup into a Ternary Content Addressable Memory (TCAM) to solve the subset query problem for small sets, i.e., to check whether a given set (the query) contains (or alternately, is contained in) any one of a large collection of sets in a database. We use each TCAM entry as a small Ternary Bloom Filter (each 'bit' of which is one of {0,1,wildcard}) to store one of the sets in the collection. Like Bloom filters, our architecture is susceptible to false positives. Since each TCAM entry is quite small, asymptotic analyses of Bloom filters do not directly apply. Surprisingly, we are able to show that the asymptotic false positive probability formula can be safely used if we penalize the small Bloom filter by taking away just one bit of storage and adding just half an extra set element before applying the formula. We believe that this analysis is independently interesting. The subset query problem has applications in databases, network intrusion detection, packet classification in Internet routers, and Information Retrieval. We demonstrate our architecture on one illustrative streaming application -- intrusion detection in network traffic. Be shingling (i.e., taking consecutive bytes of) the strings in the database, we can perform a single subset query and hence a single TCAM search, to skip many bytes in the stream. We evaluate our scheme on the open source CLAM anti-virus database, for worst-case as well as random streams. Our architecture appears to be at least one order of magnitude faster than previous approaches. Since the individual Bloom filters must fit in a single TCAM entry (currently 72 to 576 bits), our solution applies only when each set is of a small cardinality. However, this is sufficient for many typical applications. Also, recent algorithms for the subset-query problem use a small-set version as a subroutine