Structure annotation in the polish corpus of suicide notes

  • Authors:
  • Michał Marcińczuk;Monika Zaśko-Zielińska;Maciej Piasecki

  • Affiliations:
  • Institute of Informatics, Wrocław University of Technology, Wrocław, Poland;Institute of Polish Philology, Wrocław University of Technology, Wrocław, Poland;Institute of Informatics, Wrocław University of Technology, Wrocław, Poland

  • Venue:
  • TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Polish Corpus of Suicide Notes (henceforth PCSN) is constructed to meet the needs of forensic linguistics. Suicide notes are messages created in borderline situation, shortly before death. Hence the annotation schema requires a complex description of a document structure, the textual content, as well as its linguistic properties. TEI was selected as the basis for the document encoding schema. TEI adaptation and extension with respect to such aspects of encoding as: a letter structure, various layers of changes and omissions, error correction, and extra-linguistic elements etc., are discussed with examples.