Revisiting the readability assessment of texts in Portuguese

  • Authors:
  • Carolina Scarton;Caroline Gasperin;Sandra Aluisio

  • Affiliations:
  • Center of Computational Linguistics, University of São Paulo, São Paulo, SP, Brazil;Center of Computational Linguistics, University of São Paulo, São Paulo, SP, Brazil;Center of Computational Linguistics, University of São Paulo, São Paulo, SP, Brazil

  • Venue:
  • IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web content accessibility guidelines (WCAG) 2.0 include in its principle of comprehensibility an accessibility requirement related to the level of writing. This requirement states that websites with texts demanding higher reading skills than individuals with lower secondary education possess (fifth to ninth grades in Brazil) should offer them an alternative version of the same content. Natural Language Processing technology and research in Psycholinguistics can help automate the task of classifying a text according to its reading difficulty. In this paper, we present experiments to build a readability checker to classify texts in Portuguese, considering different text genres, domains and reader ages, using naturally occurring texts. More precisely, we classify texts in simple (for 7 to 14-year-olds) and complex (for adults), and address three key research questions: (1) Which machine-learning algorithm produces the best results? (2) Which features are relevant? (3) Do different text genres have an impact on readability assessment?.