Techniques for hardware-accelerated parsing for network and bioinformatic applications

  • Authors:
  • Ron K. Cytron;Young H. Cho;James M. Moscola

  • Affiliations:
  • Washington University in St. Louis;Washington University in St. Louis;Washington University in St. Louis

  • Venue:
  • Techniques for hardware-accelerated parsing for network and bioinformatic applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since the development of the first parsers, parsing has generally been considered a software problem. Software parsers have been developed for many different uses including compiling software, rendering web pages, and even translating languages. However, as new technologies and discoveries emerge, traditional software techniques for parsing data are either not fast enough to keep up with data rates, or simply take too long to produce results in a reasonable period of time. This dissertation discusses techniques and architectures for accelerating parsing in two different domains. One requires parsing of high-speed streaming data. The other requires parsing of very large data sets with a computationally complex parsing algorithm. The first part of this dissertation focuses on architectures for accelerated parsing of network data. As network rates continue to increase and the volume of data transferred across networks escalates, it will become progressively more difficult for software packet examination techniques to maintain the required throughput. Couple this with the development of new networking technologies, such as content-based routing and publish/subscribe networks, and it is clear that high-speed architectures for parsing packet payloads are required. New architectures for both pattern-matching and parsing are presented and compared to existing architectures. Additionally, two example applications are presented. The first is a simple content-based router. The second is an email parser capable of delineating and extracting user-specified portions of email messages. The second part of this dissertation examines another parsing problem where high throughput is desired, but for which many parses are possible for each input, and all such parses must be considered. The difficulty of the problem is amplified by the large volumes of data that must be parsed. More specifically, this work investigates techniques and architectures suitable for accelerating the complex parsing algorithm used for discovering new RNA molecules in genome databases. Two different hardware architectures are presented and evaluated against a well-known software suite.