IBM streams processing language: analyzing big data in motion

  • Authors:
  • M. Hirzel;H. Andrade;B. Gedik;G. Jacques-Silva;R. Khandekar;V. Kumar;M. Mendell;H. Nasgaard;S. Schneider;R. Soulé;K.-L. Wu

  • Affiliations:
  • IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;Goldman Sachs, New York, NY;Department of Computer Engineering, Bilkent University, Bilkent, Ankara, Turkey;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;Knight Capital Group, Jersey City, NJ;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Canada, Markham, ON, Canada;IBM Canada, Markham, ON, Canada;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;Cornell University, Department of Computer Science, Ithaca, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The IBM Streams Processing Language (SPL) is the programming language for IBM InfoSphere® Streams, a platform for analyzing Big Data in motion. By "Big Data in motion," we mean continuous data streams at high data-transfer rates. InfoSphere Streams processes such data with both high throughput and short response times. To meet these performance demands, it deploys each application on a cluster of commodity servers. SPL abstracts away the complexity of the distributed system, instead exposing a simple graph-of-operators view to the user. SPL has several innovations relative to prior streaming languages. For performance and code reuse, SPL provides a code-generation interface to C++ and Java®. To facilitate writing well-structured and concise applications, SPL provides higher-order composite operators that modularize stream sub-graphs. Finally, to enable static checking while exposing optimization opportunities, SPL provides a strong type system and user-defined operator models. This paper provides a language overview, describes the implementation including optimizations such as fusion, and explains the rationale behind the language design.