From Czech morphology through partial parsing to disambiguation

Authors:
Eva Mráková;Radek Sedláček
Affiliations:
NLP Laboratory, Faculty of Informatics, Masaryk University, Brno, Czech Republic;NLP Laboratory, Faculty of Informatics, Masaryk University, Brno, Czech Republic
Venue:
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Year:
2003

Citing 3
Cited 2

DESAM - Annotated Corpus for Czech

SOFSEM '97 Proceedings of the 24th Seminar on Current Trends in Theory and Practice of Informatics: Theory and Practice of Informatics
Probabilistic and rule-based tagger of an inflective language: a comparison

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Incremental construction of minimal acyclic finite state automata and transducers

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing

Context-based morphological disambiguation with random fields

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Syntactic analysis using finite patterns: a new parsing system for Czech

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with a complex system of processing raw Czech texts. Several modules were implemented which perform different levels of processing. These modules can easily be incorporated into many other linguistic applications and some of them are already exploited in this way. The first level of processing raw texts represents a reliable morphological analysis - we give a survey of the effective implementation of the robust morphological analyser for Czech named ajka. Texts tagged by ajka can be further processed by the partial parser DIS and its extension VADIS which is based on verb valencies. The output of these systems serves for automatic partial disambiguation of input texts. The tools described in this paper are widely used for parsing large corpora and can be employed in the initial phase of semantic analysis.