Contractions: breaking the tokenization-tagging circularity

  • Authors:
  • António Horta Branco;João Ricardo Silva

  • Affiliations:
  • University of Lisbon, Dept. of Informatics;University of Lisbon, Dept. of Informatics

  • Venue:
  • PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ambiguous strings are strings of non-whitespace characters, typically coinciding with orthographic contractions of word forms, that depending on the specific occurrence, are to be considered as consisting of one or more than one token. This sort of strings is shown to raise the problem of undesired circularity between tokenization and tagging. This paper presents a strategy to resolve ambiguous strings and dissolve such circularity.