Recommending random walks

Authors:
Zachary M. Saul;Vladimir Filkov;Premkumar Devanbu;Christian Bird
Affiliations:
University of California: Davis, Davis, CA;University of California: Davis, Davis, CA;University of California: Davis, Davis, CA;University of California: Davis, Davis, CA
Venue:
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Year:
2007

Citing 18
Cited 16

Program understanding: challenge for the 1990's

IBM Systems Journal
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Data mining library reuse patterns using generalized association rules

Proceedings of the 22nd international conference on Software engineering
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Mining specifications

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Component rank: relative significance rank for software component search

Proceedings of the 25th International Conference on Software Engineering
A comparison of methods for locating features in legacy software

Journal of Systems and Software
Predicting Source Code Changes by Mining Change History

IEEE Transactions on Software Engineering
Mining Version Histories to Guide Software Changes

IEEE Transactions on Software Engineering
Hipikat: A Project Memory for Software Development

IEEE Transactions on Software Engineering
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques

IEEE Transactions on Software Engineering
Automatic generation of suggestions for program investigation

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
DynaMine: finding common error patterns by mining software revision histories

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
SNIAFL: Towards a static noninteractive approach to feature location

ACM Transactions on Software Engineering and Methodology (TOSEM)
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

ICPC '06 Proceedings of the 14th IEEE International Conference on Program Comprehension
MAPO: mining API usages from open source repositories

Proceedings of the 2006 international workshop on Mining software repositories
Efficiently mining crosscutting concerns through random walks

Proceedings of the 6th international conference on Aspect-oriented software development

Developing natural language-based program analyses and tools to expedite software maintenance

Companion of the 30th international conference on Software engineering
Not all classes are created equal: toward a recommendation system for focusing testing

Proceedings of the 2008 international workshop on Recommendation systems for software engineering
Understanding interaction differences between newcomer and expert programmers

Proceedings of the 2008 international workshop on Recommendation systems for software engineering
Api hyperlinking via structural overlap

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
DebugAdvisor: a recommender system for debugging

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
MAPO: Mining and Recommending API Usage Patterns

Genoa Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming
Codebook: discovering and exploiting relationships in software repositories

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Investigating how to effectively combine static concern location techniques

Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation
Flow-augmented call graph: a new foundation for taming API complexity

FASE'11/ETAPS'11 Proceedings of the 14th international conference on Fundamental approaches to software engineering: part of the joint European conferences on theory and practice of software
The tradeoffs of societal computing

Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software
Asking and answering questions about unfamiliar APIs: an exploratory study

Proceedings of the 34th International Conference on Software Engineering
On the naturalness of software

Proceedings of the 34th International Conference on Software Engineering
Societal computing

Proceedings of the 34th International Conference on Software Engineering
Integrating information retrieval, execution and link analysis algorithms to improve feature location in software

Empirical Software Engineering
Automatically mining software-based, semantically-similar words from comment-code mappings

Proceedings of the 10th Working Conference on Mining Software Repositories
Portfolio: Searching for relevant functions and their usages in millions of lines of code

ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance

Quantified Score

Hi-index	0.00

Visualization

Abstract

We improve on previous recommender systems by taking advantage of the layered structure of software. We use a random-walk approach, mimicking the more focused behavior of a developer, who browses the caller-callee links in the callgraph of a large program, seeking routines that are likely to be related to a function of interest. Inspired by Kleinberg's work [10], we approximate the steady-state of an infinite random walk on a subset of a callgraph in order to rank the functions by their steady-state probabilities. Surprisingly, this purely structural approach works quite well. Our approach, like that of Robillard's "Suade" algorithm [15], and earlier data mining approaches [13] relies solely on the always available current state of the code, rather than other sources such as comments, documentation or revision information. Using the Apache API documentation as an oracle, we perform a quantitative evaluation of our method, finding that our algorithm dramatically improves upon Suade in this setting. We also find that the performance of traditional data mining approaches is complementary to ours; this leads naturally to an evidence-based combination of the two, which shows excellent performance on this task.