Eye Tracking Methodology: Theory and Practice
Eye Tracking Methodology: Theory and Practice
The TUNA-REG Challenge 2009: overview and evaluation results
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Algorithms for generating referring expressions: do they do what people do?
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Reformulating discourse connectives for non-expert readers
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic evaluation of linguistic quality in multi-document summarization
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
While there has been much work on computational models to predict readability based on the lexical, syntactic and discourse properties of a text, there are also interesting open questions about how computer generated text should be evaluated with target populations. In this paper, we compare two offline methods for evaluating sentence quality, magnitude estimation of acceptability judgements and sentence recall. These methods differ in the extent to which they can differentiate between surface level fluency and deeper comprehension issues. We find, most importantly, that the two correlate. Magnitude estimation can be run on the web without supervision, and the results can be analysed automatically. The sentence recall methodology is more resource intensive, but allows us to tease apart the fluency and comprehension issues that arise.