Design patterns: elements of reusable object-oriented software
Design patterns: elements of reusable object-oriented software
A language for specifying recursive traversals of object structures
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Predicting Fault Incidence Using Software Change History
IEEE Transactions on Software Engineering
Visitor combination and traversal control
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Navigating and querying code without getting lost
Proceedings of the 2nd international conference on Aspect-oriented software development
DJ: Dynamic Adaptive Programming in Java
REFLECTION '01 Proceedings of the Third International Conference on Metalevel Architectures and Separation of Crosscutting Concerns
Mining Version Histories to Guide Software Changes
Proceedings of the 26th International Conference on Software Engineering
Use of relative code churn measures to predict system defect density
Proceedings of the 27th international conference on Software engineering
Facilitating software evolution research with kenyon
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Understanding source code evolution using abstract syntax tree matching
MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Finding application errors and security flaws using PQL: a program query language
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Predicting Faults from Cached History
ICSE '07 Proceedings of the 29th international conference on Software Engineering
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The visitor pattern as a reusable, generic, type-safe component
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Predicting faults using the complexity of code changes
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
An empirical investigation into a large-scale Java open source code repository
Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
A study of the uniqueness of source code
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Mining evolution of object usage
Proceedings of the 25th European conference on Object-oriented programming
Mining Cause-Effect-Chains from Version Histories
ISSRE '11 Proceedings of the 2011 IEEE 22nd International Symposium on Software Reliability Engineering
CodeQuest: scalable source code queries with datalog
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Querying source code with natural language
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Temporal analysis of API usage concepts
Proceedings of the 34th International Conference on Software Engineering
Understanding myths and realities of test-suite evolution
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
How do developers use parallel libraries?
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Boa: a language and infrastructure for analyzing ultra-large-scale software repositories
Proceedings of the 2013 International Conference on Software Engineering
Portfolio: Searching for relevant functions and their usages in millions of lines of code
ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
Hi-index | 0.00 |
Software repositories contain a vast wealth of information about software development. Mining these repositories has proven useful for detecting patterns in software development, testing hypotheses for new software engineering approaches, etc. Specifically, mining source code has yielded significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code due to both human and computational scalability issues. In this paper we address the substantial challenges of mining source code: a) at a very large scale; b) at a fine-grained level of detail; and c) with full history information. To address these challenges, we present domain-specific language features for source code mining. Our language features are inspired by object-oriented visitors and provide a default depth-first traversal strategy along with two expressions for defining custom traversals. We provide an implementation of these features in the Boa infrastructure for software repository mining and describe a code generation strategy into Java code. To show the usability of our domain-specific language features, we reproduced over 40 source code mining tasks from two large-scale previous studies in just 2 person-weeks. The resulting code for these tasks show between 2.0x--4.8x reduction in code size. Finally we perform a small controlled experiment to gain insights into how easily mining tasks written using our language features can be understood, with no prior training. We show a substantial number of tasks (77%) were understood by study participants, in about 3 minutes per task.