Searching a mixed corpus in the light of the new portuguese orthographic norm

  • Authors:
  • Gracinda Carvalho;Isabel Falé;David Martins de Matos;Vitor Rocio

  • Affiliations:
  • Universidade Aberta, Lisboa, Portugal and L2F, INESC-ID Lisboa, Lisboa, Portugal and CITI - FCT, UNL, Lisboa, Portugal;Universidade Aberta, Lisboa, Portugal;L2F, INESC-ID Lisboa, Lisboa, Portugal and Instituto Superior Técnico, UTL, Lisboa, Portugal;Universidade Aberta, Lisboa, Portugal and CITI - FCT, UNL, Lisboa, Portugal

  • Venue:
  • PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions.