Towards morphologically annotated corpus of hospital discharge reports in Polish

  • Authors:
  • Małgorzata Marciniak;Agnieszka Mykowiecka

  • Affiliations:
  • Institute of Computer Science PAS, ul. J. K. Ordona, Warszawa, Poland;Institute of Computer Science PAS, ul. J. K. Ordona, Warszawa, Poland

  • Venue:
  • BioNLP '11 Proceedings of BioNLP 2011 Workshop
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper discuses problems in annotating a corpus containing Polish clinical data with low level linguistic information. We propose an approach to tokenization and automatic morphologic annotation of data that uses existing programs combined with a set of domain specific rules and vocabulary. Finally we present the results of manual verification of the annotation for a subset of data.