Searching a mixed corpus in the light of the new portuguese orthographic norm

Authors:
Gracinda Carvalho;Isabel Falé;David Martins de Matos;Vitor Rocio
Affiliations:
Universidade Aberta, Lisboa, Portugal and L2F, INESC-ID Lisboa, Lisboa, Portugal and CITI - FCT, UNL, Lisboa, Portugal;Universidade Aberta, Lisboa, Portugal;L2F, INESC-ID Lisboa, Lisboa, Portugal and Instituto Superior Técnico, UTL, Lisboa, Portugal;Universidade Aberta, Lisboa, Portugal and CITI - FCT, UNL, Lisboa, Portugal
Venue:
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Year:
2012

Citing 3
Cited 0

Document retrieval for question answering: a quantitative evaluation of text preprocessing

Proceedings of the ACM first Ph.D. workshop in CIKM
Priberam’s question answering system for portuguese

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
20th century esfinge (sphinx) solving the riddles at CLEF 2005

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

Quantified Score

Hi-index	0.00

Visualization

Abstract

A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions.