Analyzing large-scale object-oriented software to find and remove runtime bloat

  • Authors:
  • Atanas Rountev;Guoqing Xu

  • Affiliations:
  • The Ohio State University;The Ohio State University

  • Venue:
  • Analyzing large-scale object-oriented software to find and remove runtime bloat
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this dissertation, the term bloat is used to refer to the general phenomenon of using excessive work and memory to achieve seemingly simple tasks. Bloat exists commonly in large-scale object-oriented programs and it is a major challenge that stands in the way of bridging the productivity-performance gap between managed and unmanaged languages. The overarching goal of our work is to find large bloat-related optimization opportunities, with a small amount of developer time. As a fundamental methodology to achieve this goal, we advocate tool-assisted manual optimization, in order to combine developer insight with the automated tool support. Bloat contains wasteful operations that, while not strictly necessary for the forward progress, are executed nevertheless. We propose novel dynamic analysis techniques to detect such wasteful operations and to produce information that is necessary for the programmer to pinpoint the performance bottlenecks. One such analysis, as the first contribution of this dissertation, is copy profiling. This analysis is designed based on an important observation that the wasteful operations often consist of data copy activities that move data among heap locations without any useful computation. By profiling copies, this analysis looks for program regions containing large volumes of copies and data structures whose construction involves data copied frequently from other data structures.Different from this “from-symptom-to-cause” approach that finds bloat from the symptoms through which it manifests, the second dynamic analysis this dissertation advocates attempts to capture directly the center of bloat, which is the set of operations that, while expensive to execute, produce values of little benefit. We demonstrate, using real-world examples, that this technique can also be adopted to solve a range of backward data flow problems efficiently. With the help of a variety of data aggregation approaches, these analyses can help a programmer quickly find potential performance problems.The third contribution of the dissertation is a novel container-based heap-tracking technique, based on the observation that many memory leaks in Java programs occur due to containers that keep references to unused data entries. By profiling containers and understanding their semantics, it is much easier to track down the causes of memory leak problems, compared to existing leak detection approaches based on the tracking of arbitrary objects. We propose a specification-based dynamic technique called LeakChaser, as the fourth contribution of this dissertation. LeakChaser brings high-level application semantics into low-level leak detection by allowing programmers to specify and infer object liveness properties. This new technique exploits object lifetime relationships and uses varying levels of abstraction to help both experts and novices quickly explore the leaky behavior to pinpoint the leak cause.Using these four dynamic analyses, we have found many interesting bloat patterns that can be regularly observed in the execution of large Java programs. A further step to avoid bloat is to develop static analyses that can find and remove such patterns during application development, so that small performance issues can be prevented before they accumulate and become significant.One interesting pattern is the inefficient use of Java containers. The fifth contribution of this dissertation is a static analysis that identifies inefficiencies in the use of containers, regardless of inputs and runs. Specifically, this static analysis detects underutilized and overpopulated containers by employing a context-free-language (CFL)-reachability formulation of container operations, and by exploiting container-specific properties. The analysis is client-driven and demand-driven. It always generates highly-precise reports, but trades soundness for scalability. We show that this analysis exhibits small false positive rates, and large optimization opportunities can be found by inspecting the generated reports. (Abstract shortened by UMI.)