Finding low-utility data structures

  • Authors:
  • Guoqing Xu;Nick Mitchell;Matthew Arnold;Atanas Rountev;Edith Schonberg;Gary Sevitsky

  • Affiliations:
  • Ohio State University, Columbus, OH, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;Ohio State University, Columbus, OH, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA

  • Venue:
  • PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many opportunities for easy, big-win, program optimizations are missed by compilers. This is especially true in highly layered Java applications. Often at the heart of these missed optimization opportunities lie computations that, with great expense, produce data values that have little impact on the program's final output. Constructing a new date formatter to format every date, or populating a large set full of expensively constructed structures only to check its size: these involve costs that are out of line with the benefits gained. This disparity between the formation costs and accrued benefits of data structures is at the heart of much runtime bloat. We introduce a run-time analysis to discover these low-utility data structures. The analysis employs dynamic thin slicing, which naturally associates costs with value flows rather than raw data flows. It constructs a model of the incremental, hop-to-hop, costs and benefits of each data structure. The analysis then identifies suspicious structures based on imbalances of its incremental costs and benefits. To decrease the memory requirements of slicing, we introduce abstract dynamic thin slicing, which performs thin slicing over bounded abstract domains. We have modified the IBM J9 commercial JVM to implement this approach. We demonstrate two client analyses: one that finds objects that are expensive to construct but are not necessary for the forward execution, and second that pinpoints ultimately-dead values. We have successfully applied them to large-scale and long-running Java applications. We show that these analyses are effective at detecting operations that have unbalanced costs and benefits.