Using simulation to build inspection efficiency benchmarks for development projects
Proceedings of the 20th international conference on Software engineering
Spice: The Theory and Practice of Software Process Improvement and Capability Determination
Spice: The Theory and Practice of Software Process Improvement and Capability Determination
The Internal Consistencies of the 1987 SEI Maturity Questionnaireand the SPICE Capability Dimension
Empirical Software Engineering
Evaluating the Interrater Agreement of Process Capability Ratings
METRICS '97 Proceedings of the 4th International Symposium on Software Metrics
The Internal Consistency of the ISO/IEC 15504 Software Process Capability Scale
METRICS '98 Proceedings of the 5th International Symposium on Software Metrics
Cost Implications of Interrater Agreement for Software Process Assessments
METRICS '98 Proceedings of the 5th International Symposium on Software Metrics
SPICE: an empiricist's perspective
ISESS '95 Proceedings of the 2nd IEEE Software Engineering Standards Symposium
Modelling the Reliability of SPICE Based Assessments
ISESS '97 Proceedings of the 3rd International Software Engineering Standards Symposium (ISESS '97)
The Repeatability of Code Defect Classifications
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Interrater agreement in SPICE-based assessments: some preliminary results
ICSP '96 Proceedings of the Fourth International Conference on the Software Process (ICSP '96)
Practical Statistics for Medical Research
Practical Statistics for Medical Research
Analysing primary and lower order project success drivers
SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Assessing Project Success Using Subjective Evaluation Factors
Software Quality Control
Empirical Software Engineering
Experimental context classification: incentives and experience of subjects
Proceedings of the 27th international conference on Software engineering
Classification of usability problems (CUP) scheme: augmentation and exploitation
Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles
Journal of Systems and Software
SPICE in retrospect: Developing a standard for process assessment
Journal of Systems and Software
Automated classification of change messages in open source projects
Proceedings of the 2008 ACM symposium on Applied computing
The software product line architecture: An empirical investigation of key process activities
Information and Software Technology
An organizational maturity model of software product line engineering
Software Quality Control
Proposing an ISO/IEC 15504-2 compliant method for process capability/maturity models customization
PROFES'11 Proceedings of the 12th international conference on Product-focused software process improvement
An architecture process maturity model of software product line engineering
Innovations in Systems and Software Engineering
An open source usability maturity model (OS-UMM)
Computers in Human Behavior
FASE'12 Proceedings of the 15th international conference on Fundamental Approaches to Software Engineering
Organizational learning through project postmortem reviews: an explorative case study
EuroSPI'07 Proceedings of the 14th European conference on Software Process Improvement
Journal of Systems and Software
Hi-index | 0.00 |
Softwareprocess assessments are by now a prevalent tool for process improvementand contract risk assessment in the software industry. Giventhat scores are assigned to processes during an assessment, aprocess assessment can be considered a subjective measurementprocedure. As with any subjective measurement procedure, thereliability of process assessments has important implicationson the utility of assessment scores, and therefore the reliabilityof assessments can be taken as a criterion for evaluating anassessment‘s quality. The particular type of reliability of interestin this paper is interrater agreement. Thus far, empirical evaluationsof the interrater agreement of assessments have used Cohen‘sKappa coefficient. Once a Kappa value has been derived, the nextquestion is ’’how good is it?‘‘ Benchmarks for interpreting theobtained values of Kappa are available from the social sciencesand medical literature. However, the applicability of these benchmarksto the software process assessment context is not obvious. Inthis paper we develop a benchmark for interpreting Kappa valuesusing data from ratings of 70 process instances collected fromassessments of 19 different projects in 7 different organizationsin Europe during the SPICE Trials (this is an international effortto empirically evaluate the emerging ISO/IEC 15504 InternationalStandard for Software Process Assessment). The benchmark indicatesthat Kappa values below 0.45 are poor, and values above 0.62constitute substantial agreement and should be the minimum aimedfor. This benchmark can be used to decide how good an assessment‘sreliability is.