Named entities in Czech: annotating data and developing NE tagger

  • Authors:
  • Magda Ševčíková;Zdeněk Žabokrtsky;Oldřich Krůza

  • Affiliations:
  • Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic;Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic;Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

  • Venue:
  • TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with the treatment of Named Entities (NEs) in Czech. We introduce a two-level NE classification. We have used this classification for manual annotation of two thousand sentences, gaining more than 11,000 NE instances. Employing the annotated data and Machine-Learning techniques (namely the top-down induction of decision trees), we have developed and evaluated a software system aimed at automatic detection and classification of NEs in Czech texts.