Effective architecture of the polish tagger

  • Authors:
  • Maciej Piasecki;Grzegorz Godlewski

  • Affiliations:
  • Institute of Applied Informatics, Wrocław University of Technology, Wrocław, Poland;Institute of Applied Informatics, Wrocław University of Technology, Wrocław, Poland

  • Venue:
  • TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The large tagset of the IPI PAN Corpus of Polish and the limited size of the learning corpus make construction of a tagger especially demanding The goal of this work is to decompose the overall process of tagging of Polish into subproblems of partial disambiguation Moreover, an architecture of a tagger facilitating this decomposition is proposed The proposed architecture enables easy integration of hand-written tagging rules with the rest of the tagger The architecture is open for different types of classifiers A complete tagger for Polish called TaKIPI is also presented Its configuration, the achieved results (92.55% of accuracy for all tokens, 84.75% for ambiguous tokens in ten-fold test), and considered variants of the architecture are discussed, too.