Czech named entity corpus and SVM-based recognizer

  • Authors:
  • Jana Kravalová;Zdeněk Žabokrtský

  • Affiliations:
  • Charles University in Prague;Charles University in Prague

  • Venue:
  • NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on Support Vector Machine classification technique. The presented recognizer outperforms the results previously reported for NE recognition in Czech.