Benchmarking Kappa: Interrater Agreement in Software ProcessAssessments

Authors:
Khaled El Emam
Affiliations:
Fraunhofer Institute for Experimental Software Engineering, Sauerwiesen 6, D-67661 Kaiserslautern, Germany
Venue:
Empirical Software Engineering
Year:
1999

Citing 11
Cited 16

Using simulation to build inspection efficiency benchmarks for development projects

Proceedings of the 20th international conference on Software engineering
Spice: The Theory and Practice of Software Process Improvement and Capability Determination

Spice: The Theory and Practice of Software Process Improvement and Capability Determination
The Internal Consistencies of the 1987 SEI Maturity Questionnaireand the SPICE Capability Dimension

Empirical Software Engineering
Evaluating the Interrater Agreement of Process Capability Ratings

METRICS '97 Proceedings of the 4th International Symposium on Software Metrics
The Internal Consistency of the ISO/IEC 15504 Software Process Capability Scale

METRICS '98 Proceedings of the 5th International Symposium on Software Metrics
Cost Implications of Interrater Agreement for Software Process Assessments

METRICS '98 Proceedings of the 5th International Symposium on Software Metrics
SPICE: an empiricist's perspective

ISESS '95 Proceedings of the 2nd IEEE Software Engineering Standards Symposium
Modelling the Reliability of SPICE Based Assessments

ISESS '97 Proceedings of the 3rd International Software Engineering Standards Symposium (ISESS '97)
The Repeatability of Code Defect Classifications

ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Interrater agreement in SPICE-based assessments: some preliminary results

ICSP '96 Proceedings of the Fourth International Conference on the Software Process (ICSP '96)
Practical Statistics for Medical Research

Practical Statistics for Medical Research

Analysing primary and lower order project success drivers

SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Assessing Project Success Using Subjective Evaluation Factors

Software Quality Control
Prioritizing and Assessing Software Project Success Factors and Project Characteristics using Subjective Data

Empirical Software Engineering
Experimental context classification: incentives and experience of subjects

Proceedings of the 27th international conference on Software engineering
Classification of usability problems (CUP) scheme: augmentation and exploitation

Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles
Institutionalization of software product line: An empirical investigation of key organizational factors

Journal of Systems and Software
SPICE in retrospect: Developing a standard for process assessment

Journal of Systems and Software
Automated classification of change messages in open source projects

Proceedings of the 2008 ACM symposium on Applied computing
The software product line architecture: An empirical investigation of key process activities

Information and Software Technology
An organizational maturity model of software product line engineering

Software Quality Control
Proposing an ISO/IEC 15504-2 compliant method for process capability/maturity models customization

PROFES'11 Proceedings of the 12th international conference on Product-focused software process improvement
An architecture process maturity model of software product line engineering

Innovations in Systems and Software Engineering
An open source usability maturity model (OS-UMM)

Computers in Human Behavior
Tracing your maintenance work --- a cross-project validation of an automated classification dictionary for commit messages

FASE'12 Proceedings of the 15th international conference on Fundamental Approaches to Software Engineering
Organizational learning through project postmortem reviews: an explorative case study

EuroSPI'07 Proceedings of the 14th European conference on Software Process Improvement
Assessing the reliability, validity and acceptance of a classification scheme of usability problems (CUP)

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Softwareprocess assessments are by now a prevalent tool for process improvementand contract risk assessment in the software industry. Giventhat scores are assigned to processes during an assessment, aprocess assessment can be considered a subjective measurementprocedure. As with any subjective measurement procedure, thereliability of process assessments has important implicationson the utility of assessment scores, and therefore the reliabilityof assessments can be taken as a criterion for evaluating anassessment‘s quality. The particular type of reliability of interestin this paper is interrater agreement. Thus far, empirical evaluationsof the interrater agreement of assessments have used Cohen‘sKappa coefficient. Once a Kappa value has been derived, the nextquestion is ’’how good is it?‘‘ Benchmarks for interpreting theobtained values of Kappa are available from the social sciencesand medical literature. However, the applicability of these benchmarksto the software process assessment context is not obvious. Inthis paper we develop a benchmark for interpreting Kappa valuesusing data from ratings of 70 process instances collected fromassessments of 19 different projects in 7 different organizationsin Europe during the SPICE Trials (this is an international effortto empirically evaluate the emerging ISO/IEC 15504 InternationalStandard for Software Process Assessment). The benchmark indicatesthat Kappa values below 0.45 are poor, and values above 0.62constitute substantial agreement and should be the minimum aimedfor. This benchmark can be used to decide how good an assessment‘sreliability is.