Algorithms for clustering data
Algorithms for clustering data
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Expertise recommender: a flexible recommendation system and architecture
CSCW '00 Proceedings of the 2000 ACM conference on Computer supported cooperative work
Expertise browser: a quantitative approach to identifying expertise
Proceedings of the 24th International Conference on Software Engineering
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures
ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
Semantic clustering: Identifying topics in source code
Information and Software Technology
Hi-index | 0.00 |
We are interested in identifying the domain expertise of developers of a software system. A developer gains expertise on the code base as well as the domain of the software system he/she develops. This information forms a useful input in allocating software implementation tasks to developers. Domain concepts represented by the system are discovered by taking into account the linguistic information available in the source code. The vocabulary contained in source code as identifiers such as class, method, variable names and comments are extracted. Concepts present in the code base are identified and grouped based on a well known text processing hypothesis - words are similar to the extent to which they share similar words. The developer's association with the source code and the concepts it represents is arrived at using the version repository information. In this line, the analysis first derives documents from source code by discarding all the programming language constructs. KMeans clustering is further used to cluster documents and extract closely related concepts. The key concepts present in the documents authored by the developer determine his/her domain expertise. To validate our approach we apply it on large software systems, two of which are presented in detail in this paper.