Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
An empirical investigation into a large-scale Java open source code repository
Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
An experience report on scaling tools for mining software repositories using MapReduce
Proceedings of the IEEE/ACM international conference on Automated software engineering
Journal of Systems and Software
Controversy Corner: On the relationship between comment update practices and Software Bugs
Journal of Systems and Software
Using the GPGPU for scaling up mining software repositories
Proceedings of the 34th International Conference on Software Engineering
An empirical study of the fault-proneness of clone mutation and clone migration
Proceedings of the 10th Working Conference on Mining Software Repositories
Hi-index | 0.00 |
Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.