Data science for software engineering

Authors:
Tim Menzies;Ekrem Kocaguneli;Fayola Peters;Burak Turhan;Leandro L. Minku
Affiliations:
West Virginia University, USA;West Virginia University, USA;West Virginia University, USA;University of Oulu, Finland;University of Birmingham, UK
Venue:
Proceedings of the 2013 International Conference on Software Engineering
Year:
2013

Citing 14
Cited 0

Finding the Right Data for Software Cost Modeling

IEEE Software
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Is Data Privacy Always Good for Software Testing?

ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
The inductive software engineering manifesto: principles for industrial data mining

Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering
How to Find Relevant Data for Effort Estimation?

ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
On the dataset shift problem in software engineering prediction models

Empirical Software Engineering
Local vs. global models for effort estimation and defect prediction

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

IEEE Transactions on Software Engineering
Privacy and utility for defect prediction: experiments with MORPH

Proceedings of the 34th International Conference on Software Engineering
A few useful things to know about machine learning

Communications of the ACM
Can cross-company data improve performance in software effort estimation?

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
On the Value of Ensemble Effort Estimation

IEEE Transactions on Software Engineering
Empirical evaluation of the effects of mixed project data on learning defect predictors

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Target audience: Software practitioners and researchers wanting to understand the state of the art in using data science for software engineering (SE). Content: In the age of big data, data science (the knowledge of deriving meaningful outcomes from data) is an essential skill that should be equipped by software engineers. It can be used to predict useful information on new projects based on completed projects. This tutorial offers core insights about the state-of-the-art in this important field. What participants will learn: Before data science: this tutorial discusses the tasks needed to deploy machine-learning algorithms to organizations (Part1: Organization Issues). During data science: from discretization to clustering to dichotomization and statistical analysis. And the rest: When local data is scarce, we show how to adapt data from other organizations to local problems. When privacy concerns block access, we show how to privatize data while still being able to mine it. When working with data of dubious quality, we show how to prune spurious information. When data or models seem too complex, we show how to simplify data mining results. When data is too scarce to support intricate models, we show methods for generating predictions. When the world changes, and old models need to be updated, we show how to handle those updates. When the effect is too complex for one model, we show how to reason across ensembles of models. Pre-requisites: This tutorial makes minimal use of maths of advanced algorithms and would be understandable by developers and technical managers.