A dataset for the evaluation of lexical simplification

Authors:
Jan De Belder;Marie-Francine Moens
Affiliations:
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 10
Cited 0

The kappa statistic: a second look

Computational Linguistics
Natural language processing tools for reading level assessment and text simplification for bilingual education

Natural language processing tools for reading level assessment and text simplification for bilingual education
SemEval-2007 task 10: English lexical substitution task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Automatic acquisition of context-specific lexical paraphrases

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Fostering digital inclusion and accessibility: the PorSimples project for simplification of Portuguese texts

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
A monolingual tree-based translation model for sentence simplification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Cross-media entity recognition in nearly parallel visual and textual documents

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Putting it simply: a context-aware approach to lexical simplification

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning to simplify sentences with quasi-synchronous grammar and integer programming

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lexical Simplification is the task of replacing individual words of a text with words that are easier to understand, so that the text as a whole becomes easier to comprehend, e.g. by people with learning disabilities or by children who learn to read. Although this seems like a straightforward task, evaluating algorithms for this task is not so. The problem is how to build a dataset that provides an exhaustive list of easier to understand words in different contexts, and to obtain an absolute ordering on this list of synonymous expressions. In this paper we reuse existing resources for a similar problem, that of Lexical Substitution, and transform this dataset into a dataset for Lexical Simplification. This new dataset contains 430 sentences, with in each sentence one word marked. For that word, a list of words that can replace it, sorted by their difficulty, is provided. The paper reports on how this dataset was created based on the annotations of different persons, and their agreement. In addition we provide several metrics for computing the similarity between ranked lexical substitutions, which are used to assess the value of the different annotations, but which can also be used to compare the lexical simplifications suggested by an algorithm with the ground truth model.