Approximate graph clustering for program characterization

Authors:
John Demme;Simha Sethumadhavan
Affiliations:
Columbia University, New York;Columbia University, New York
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Year:
2012

Citing 35
Cited 0

Two fundamental issues in multiprocessing

4th International DFVLR Seminar on Foundations of Engineering Sciences on Parallel Computing in Science and Engineering
Dynamic points-to sets: a comparison with static analyses and potential applications in program understanding and optimization

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
What's the code?: automatic classification of source code archives

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Program representation and behavioural matching for localizing similar code fragments

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
Predicting Unroll Factors Using Supervised Classification

Proceedings of the international symposium on Code generation and optimization
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Organizing and visualizing software repositories using the growing hierarchical self-organizing map

Proceedings of the 2005 ACM symposium on Applied computing
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
A framework for unrestricted whole-program optimization

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Measuring Benchmark Similarity Using Inherent Program Characteristics

IEEE Transactions on Computers
Performance prediction based on inherent program similarity

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Scalable subgraph mapping for acyclic computation accelerators

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Semantic clustering: Identifying topics in source code

Information and Software Technology
Fast compiler optimisation evaluation using code-feature based performance prediction

Proceedings of the 4th international conference on Computing frontiers
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite

Proceedings of the 34th annual international symposium on Computer architecture
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Proceedings of the International Symposium on Code Generation and Optimization
PEAK—a fast and effective performance tuning system via compiler optimization orchestration

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems (TOPLAS)
Collective Optimization

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Automatic Feature Generation for Machine Learning Based Optimizing Compilation

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Complete and accurate clone detection in graph-based models

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A Data Mining Approach for Detecting Higher-Level Clones in Software

IEEE Transactions on Software Engineering
Computer Architecture Performance Evaluation Methods

Computer Architecture Performance Evaluation Methods
Practical aggregation of semantical program properties for machine learning based optimization

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Rapid identification of architectural bottlenecks via precise event counting

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important aspect of system optimization research is the discovery of program traits or behaviors. In this paper, we present an automated method of program characterization which is able to examine and cluster program graphs, i.e., dynamic data graphs or control flow graphs. Our novel approximate graph clustering technology allows users to find groups of program fragments which contain similar code idioms or patterns in data reuse, control flow, and context. Patterns of this nature have several potential applications including development of new static or dynamic optimizations to be implemented in software or in hardware. For the SPEC CPU 2006 suite of benchmarks, our results show that approximate graph clustering is effective at grouping behaviorally similar functions. Graph based clustering also produces clusters that are more homogeneous than previously proposed non-graph based clustering methods. Further qualitative analysis of the clustered functions shows that our approach is also able to identify some frequent unexploited program behaviors. These results suggest that our approximate graph clustering methods could be very useful for program characterization.