A small cache of large ranges: Hardware methods for efficiently searching, storing, and updating big dataflow tags

  • Authors:
  • Mohit Tiwari;Banit Agrawal;Shashidhar Mysore;Jonathan Valamehr;Timothy Sherwood

  • Affiliations:
  • Department of Computer Science, University of California, Santa Barbara, USA;Department of Computer Science, University of California, Santa Barbara, USA;Department of Computer Science, University of California, Santa Barbara, USA;Department of Electrical and Computer Engineering, University of California, Santa Barbara, USA;Department of Computer Science, University of California, Santa Barbara, USA

  • Venue:
  • Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamically tracking the flow of data within a microprocessor creates many new opportunities to detect and track malicious or erroneous behavior, but these schemes all rely on the ability to associate tags with all of virtual or physical memory. If one wishes to store large 32-bit tags, multiple tags per data element, or tags at the granularity of bytes rather than words, then directly storing one tag on chip to cover one byte or word (in a cache or otherwise) can be an expensive proposition. We show that dataflow tags in fact naturally exhibit a very high degree of spatial-value locality, an observation we can exploit by storing metadata on ranges of addresses (which cover a non-aligned contiguous span of memory) rather than on individual elements. In fact, a small 128 entry on-chip range cache (with area equivalent to 4KB of SRAM) hits more than 98% of the time on average. The key to this approach is our proposed method by which ranges of tags are kept in cache in an optimally RLE-compressed form, queried at high speed, swapped in and out with secondary memory storage, and (most important for dataflow tracking) rapidly stitched together into the largest possible ranges as new tags are written on every store, all the while correctly handling the cases of unaligned and overlapping ranges. We examine the effectiveness of this approach by simulating its use in definedness tracking (covering both the stack and the heap), in tracking network-derived dataflow through a multi-language web application, and through a synthesizable prototype implementation.