Implementation of Croatian NERC system

Authors:
Božo Bekavac;Marko Tadić
Affiliations:
University of Zagreb, Zagreb, Croatia;University of Zagreb, Zagreb, Croatia
Venue:
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Year:
2007

Citing 9
Cited 0

Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
Named-Entity Recognition from Greek and English Texts

Journal of Intelligent and Robotic Systems
Using corpus-derived name lists for named entity recognition

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Partial parsing via finite-state cascades

Natural Language Engineering
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Finite-state transducer cascades to extract named entities in texts

Theoretical Computer Science - Implementation and application automata
A knowledge-free method for capitalized word disambiguation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper a system for Named Entity Recognition and Classification in Croatian language is described. The system is composed of the module for sentence segmentation, inflectional lexicon of common words, inflectional lexicon of names and regular local grammars for automatic recognition of numerical and temporal expressions. After the first step (sentence segmentation), the system attaches to each token its full morphosyntactic description and appropriate lemma and additional tags for potential categories for names without disambiguation. The third step (the core of the system) is the application of a set of rules for recognition and classification of named entities in already annotated texts. Rules based on described strategies (like internal and external evidence) are applied in cascade of transducers in defined order. Although there are other classification systems for NEs, the results of our system are annotated NEs which are following MUC-7 specification. System is applied on informative and noninformative texts and results are compared. F-measure of the system applied on informative texts yields over 90%.