Modeling human coding of free response data

Authors:
Shahram Ghiasinejad;Richard M. Golden
Affiliations:
Department of Psycology, University of Central Florida, Orlando, FL 32816, United States;School of Behavioral and Brain Sciences, University of Texas at Dallas, United States
Venue:
Computers in Human Behavior
Year:
2013

Citing 10
Cited 0

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Development and evaluation of a computerized admission diagnosis encoding system

Computers and Biomedical Research
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Statistical Language Learning

Statistical Language Learning
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
A Parallel Computational Model for Integrated Speech and Natural Language Understanding

IEEE Transactions on Computers
Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays

IEEE Intelligent Systems
Automatic labeling of semantic roles

Computational Linguistics
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
PAS-II: An Interactive Task-Free Version of an Automatic Protocol Analysis System

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Summarization, recall, think-aloud, and question-answering protocol data are examples of free response verbal reports used for the purposes of revealing the structure and content of internal mental representations and processes within the field of discourse processes. Typically, two experienced coders independently semantically annotate a portion of collected protocol data and measures of agreement are used to determine the reliability of the coding. This methodology, however, does not provide an effective method for communicating in an unambiguous manner complex coding procedures to other researchers. To address this problem, an automated methodology called AUTOCODER for coding free response data is evaluated. The AUTOCODER system works by actively interacting with an experienced human coder who semantically annotates key words with ''word-concepts'' and sequences of word-concepts with ''propositions''. After training AUTOCODER on a set of 70 segmented and semantically annotated free response verbal reports originally generated by second grade and fifth grade students, AUTOCODER exhibited a good proposition agreement rate of 91% and a kappa agreement score of 65% with respect to an experienced human coder on an additional set of 24 unsegmented free response verbal reports. Limitations and general implications of these findings are also discussed.