Czech named entity corpus and SVM-based recognizer

Authors:
Jana Kravalová;Zdeněk Žabokrtský
Affiliations:
Charles University in Prague;Charles University in Prague
Venue:
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Year:
2009

Citing 8
Cited 3

Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A robust risk minimization based named entity recognition system

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A context pattern induction method for named entity extraction

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
TectoMT: highly modular MT system with tectogrammatics used as transfer layer

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Named entities in Czech: annotating data and developing NE tagger

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

TectoMT: modular NLP framework

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Integration of speech and text processing modules into a real-time dialogue system

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Maximum entropy named entity recognition for Czech language

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on Support Vector Machine classification technique. The presented recognizer outperforms the results previously reported for NE recognition in Czech.