What different kinds of stratification can reveal about the generalizability of data-mined skill assessment models

Authors:
Michael A. Sao Pedro;Ryan S. J. D. Baker;Janice D. Gobert
Affiliations:
Worcester Polytechnic Institute, Worcester, MA;Teacher's College, Columbia, New York, NY;Worcester Polytechnic Institute, Worcester, MA
Venue:
Proceedings of the Third International Conference on Learning Analytics and Knowledge
Year:
2013

Citing 7
Cited 1

Developing a generalizable detector of when students game the system

User Modeling and User-Adapted Interaction
Automatic detection of users' skill levels using high-frequency user interface events

User Modeling and User-Adapted Interaction
Using learning analytics to assess students' behavior in open-ended programming tasks

Proceedings of the 1st International Conference on Learning Analytics and Knowledge
The sum is greater than the parts: ensembling models of student knowledge in educational software

ACM SIGKDD Explorations Newsletter
Learning analytics: envisioning a research discipline and a domain of practice

Proceedings of the 2nd International Conference on Learning Analytics and Knowledge
Improving construct validity yields better models of systematic inquiry, even with less information

UMAP'12 Proceedings of the 20th international conference on User Modeling, Adaptation, and Personalization
Leveraging machine-learned detectors of systematic inquiry behavior to estimate and predict transfer of inquiry skill

User Modeling and User-Adapted Interaction

Assessing elementary students' science competency with text analytics

Proceedings of the Fourth International Conference on Learning Analytics And Knowledge

Quantified Score

Hi-index	0.00

Visualization

Abstract

When validating assessment models built with data mining, generalization is typically tested at the student-level, where models are tested on new students. This approach, though, may fail to find cases where model performance suffers if other aspects of those cases relevant to prediction are not well represented. We explore this here by testing if scientific inquiry skill models built and validated for one science topic can predict skill demonstration for new students and a new science topic. Test cases were chosen using two methods: student-level stratification, and stratification based on the amount of trials ran during students' experimentation. We found that predictive performance of the models was different on each test set, revealing limitations that would have been missed from student-level validation alone.