Evaluating text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Towards developing general models of usability with PARADISE
Natural Language Engineering
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Predicting the quality and usability of spoken dialogue services
Speech Communication
Technical support dialog systems: issues, problems, and solutions
NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Comparing Linguistic Features for Modeling Learning in Computer Tutoring
Proceedings of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work
Exploiting discourse structure for spoken dialogue performance analysis
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Understanding complex natural language explanations in tutorial applications
ScaNaLU '06 Proceedings of the Third Workshop on Scalable Natural Language Understanding
Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains Modalities
Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling
The “DeMAND” coding scheme: A “common language” for representing and analyzing student discourse
Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling
Dealing with interpretation errors in tutorial dialogue
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
"Ask not what textual entailment can do for you..."
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
BEETLE II: a system for tutoring and computational linguistics experimentation
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
SemEval-2010 task 12: Parser evaluation using textual entailments
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Intelligent tutoring with natural language support in the BEETLE II system
EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
The AT&T spoken language understanding system
IEEE Transactions on Audio, Speech, and Language Processing
Towards effective tutorial feedback for explanation questions: a dataset and baselines
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
It is not always clear how the differences in intrinsic evaluation metrics for a parser or classifier will affect the performance of the system that uses it. We investigate the relationship between the intrinsic evaluation scores of an interpretation component in a tutorial dialogue system and the learning outcomes in an experiment with human users. Following the PARADISE methodology, we use multiple linear regression to build predictive models of learning gain, an important objective outcome metric in tutorial dialogue. We show that standard intrinsic metrics such as F-score alone do not predict the outcomes well. However, we can build predictive performance functions that account for up to 50% of the variance in learning gain by combining features based on standard evaluation scores and on the confusion matrix entries. We argue that building such predictive models can help us better evaluate performance of NLP components that cannot be distinguished based on F-score alone, and illustrate our approach by comparing the current interpretation component in the system to a new classifier trained on the evaluation data.