StreamGen: A Workload Generation Tool for Distributed Information Flow Applications

  • Authors:
  • Mohamed Mansour;Matthew Wolf;Karsten Schwan

  • Affiliations:
  • Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology

  • Venue:
  • ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the StreamGen load generator, which is targeted at distributed information flow applications. These include the event streaming services used in wide-area publish/subscribe systems or in operational information systems, the data streaming services used in remote visualization or collaboration, and the continuous data streams occurring in download services. Running across heterogeneous distributed platforms, these services are implemented by computational component that capture, manipulate, and produce information streams and are linked via overlay topologies. StreamGen can be used to produce the distributed computational and communication loads imposed by these applications. Dynamic application behaviors can be created with mathematical specifications or with behavior traces collected from application-level traces. An interesting set of traces presented in this paper is derived from long-term observations of the FTP download patterns observed at the Linux mirror site being run by the CERCS research center at the Georgia Institute of Technology. Two different flow-based applications are created and evaluated with StreamGen. The first emulates the data streaming behavior in a distributed scientific collaboration, where a scientific simulation (i.e., a molecular dynamics code) produces simulation data sent to and displayed for multiple, interactive remote users. The second emulates portions of the event-streaming behavior of an operational information system used by a large U.S. corporation. Parametric studies with StreamGenýs FTP traces applied to these applications are used to evaluate different load balancing strategies for the cluster machines manipulating these applicationsý data streams.