DESAM - Annotated Corpus for Czech
SOFSEM '97 Proceedings of the 24th Seminar on Current Trends in Theory and Practice of Informatics: Theory and Practice of Informatics
Probabilistic and rule-based tagger of an inflective language: a comparison
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Incremental construction of minimal acyclic finite state automata and transducers
FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Context-based morphological disambiguation with random fields
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Syntactic analysis using finite patterns: a new parsing system for Czech
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Hi-index | 0.00 |
This paper deals with a complex system of processing raw Czech texts. Several modules were implemented which perform different levels of processing. These modules can easily be incorporated into many other linguistic applications and some of them are already exploited in this way. The first level of processing raw texts represents a reliable morphological analysis - we give a survey of the effective implementation of the robust morphological analyser for Czech named ajka. Texts tagged by ajka can be further processed by the partial parser DIS and its extension VADIS which is based on verb valencies. The output of these systems serves for automatic partial disambiguation of input texts. The tools described in this paper are widely used for parsing large corpora and can be employed in the initial phase of semantic analysis.