Two case studies of open source software development: Apache and Mozilla
ACM Transactions on Software Engineering and Methodology (TOSEM)
Second ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS)
Proceedings of the 26th International Conference on Software Engineering
Concern based mining of heterogeneous software repositories
Proceedings of the 2006 international workshop on Mining software repositories
Geographic location of developers at SourceForge
Proceedings of the 2006 international workshop on Mining software repositories
The processes of joining in global distributed software projects
Proceedings of the 2006 international workshop on Global software development for the practitioner
Using Software Repositories to Investigate Socio-technical Congruence in Development Projects
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Journal of Software Maintenance and Evolution: Research and Practice
Research friendly software repositories
Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops
ICSP'08 Proceedings of the Software process, 2008 international conference on Making globally distributed software development a success story
On the central role of mailing lists in open source projects: an exploratory study
JSAI-isAI'09 Proceedings of the 2009 international conference on New frontiers in artificial intelligence
The MSR cookbook: mining a decade of research
Proceedings of the 10th Working Conference on Mining Software Repositories
A historical dataset of software engineering conferences
Proceedings of the 10th Working Conference on Mining Software Repositories
A comparison of identity merge algorithms for software repositories
Science of Computer Programming
Hi-index | 0.00 |
Studying a software project by mining data from a single repository has been a very active research field in software engineering during the last years. However, few efforts have been devoted to perform studies by integrating data from various repositories, with different kinds of information, which would, for instance, track the different activities of developers. One of the main problems of these multi-repository studies is the different identities that developers use when they interact with different tools in different contexts. This makes them appear as different entities when data is mined from different repositories (and in some cases, even from a single one). In this paper we propose an approach, based on the application of heuristics, to identify the many identities of developers in such cases, and a data structure for allowing both the anonymized distribution of information, and the tracking of identities for verification purposes. The methodology will be presented in general, and applied to the GNOME project as a case example. Privacy issues and partial merging with new data sources will also be considered and discussed.