Automated topic naming

Authors:
Abram Hindle;Neil A. Ernst;Michael W. Godfrey;John Mylopoulos
Affiliations:
Dept. of Computing Science, University of Alberta, Edmonton, Canada;Dept. of Computer Science, University of British Columbia, Vancouver, Canada;David Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada;Dept. Information Eng. and Computer Science, University of Trento, Trento, Italy
Venue:
Empirical Software Engineering
Year:
2013

Citing 18
Cited 0

Quantitative evaluation of software quality

ICSE '76 Proceedings of the 2nd international conference on Software engineering
Identifying Reasons for Software Changes Using Historic Databases

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Latent dirichlet allocation

The Journal of Machine Learning Research
An Information Retrieval Approach to Concept Location in Source Code

WCRE '04 Proceedings of the 11th Working Conference on Reverse Engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
The Detection and Classification of Non-Functional Requirements with Application to Early Aspects

RE '06 Proceedings of the 14th IEEE International Requirements Engineering Conference
Information Dashboard Design: The Effective Visual Communication of Data

Information Dashboard Design: The Effective Visual Communication of Data
Release Pattern Discovery via Partitioning: Methodology and Case Study

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A New Standard for Quality Requirements

IEEE Software
What do large commits tell us?: a taxonomical study of large commits

Proceedings of the 2008 international working conference on Mining software repositories
A theory of aspects as latent topics

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
ConcernLines: A timeline view of co-occurring concerns

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Towards an Ontology for Software Product Quality Attributes

ICIW '09 Proceedings of the 2009 Fourth International Conference on Internet and Web Applications and Services
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

ACM SIGKDD Explorations Newsletter
Automated topic naming to support cross-project analysis of software maintenance activities

Proceedings of the 8th Working Conference on Mining Software Repositories

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software repositories provide a deluge of software artifacts to analyze. Researchers have attempted to summarize, categorize, and relate these artifacts by using semi-unsupervised machine-learning algorithms, such as Latent Dirichlet Allocation (LDA). LDA is used for concept and topic analysis to suggest candidate word-lists or topics that describe and relate software artifacts. However, these word-lists and topics are difficult to interpret in the absence of meaningful summary labels. Current attempts to interpret topics assume manual labelling and do not use domain-specific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using LDA from commit-log comments recovered from source control systems. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on three large-scale Relational Database Management System (RDBMS) projects: MySQL, PostgreSQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels that are relevant to these projects, and provide fresh insight into their evolving software development activities.