Towards morphologically annotated corpus of hospital discharge reports in Polish

Authors:
Małgorzata Marciniak;Agnieszka Mykowiecka
Affiliations:
Institute of Computer Science PAS, ul. J. K. Ordona, Warszawa, Poland;Institute of Computer Science PAS, ul. J. K. Ordona, Warszawa, Poland
Venue:
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Year:
2011

Citing 6
Cited 1

Encoding biomedical resources in TEI: the case of the GENIA corpus

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
An empirical study of tokenization strategies for biomedical information retrieval

Information Retrieval
Rule-based information extraction from patients' clinical data

Journal of Biomedical Informatics
Building a semantically annotated corpus of clinical texts

Journal of Biomedical Informatics
Corpus design for biomedical natural language processing

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Automatic semantic labeling of medical texts with feature structures

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Annotation schemes to encode domain knowledge in medical narratives

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper discuses problems in annotating a corpus containing Polish clinical data with low level linguistic information. We propose an approach to tokenization and automatic morphologic annotation of data that uses existing programs combined with a set of domain specific rules and vocabulary. Finally we present the results of manual verification of the annotation for a subset of data.