A corpus-based approach to language learning
A corpus-based approach to language learning
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Grammatical Agreement and Automatic Morphological Disambiguation of Inflectional Languages
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
The Linguistic Basis of a Rule-Based Tagger of Czech
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Tagging of very large corpora: topic-focus articulation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Deletions and their reconstruction in tectogrammatical syntactic tagging of very large corpora
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Multilinguality in a text generation system for three Slavic languages
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Automatic Extraction of Clause Relationships from a Treebank
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Slavonic information extraction and partial parsing
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Glen, Glenda or Glendale: unsupervised and semi-supervised learning of English noun gender
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
From Czech morphology through partial parsing to disambiguation
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Towards the adequate evaluation of morphosyntactic taggers
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Dedicated nominal featurization of portuguese
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Automatic evaluation of syntactic learners in typologically-different languages
Cognitive Systems Research
Hi-index | 0.00 |
We present results of probabilistic tagging of Czech texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages. After description of the tag system used, we show the results of four experiments using a simple probabilistic model to tag Czech texts (unigram, two bigram experiments, and a trigram one). For comparison, we have applied the same code and settings to tag an English text (another four experiments) using the same size of training and test data in the experiments in order to avoid any doubt concerning the validity of the comparison. The experiments use the source channel model and maximum likelihood training on a Czech hand-tagged corpus and on tagged Wall Street Journal (WSJ) from the LDC collection. The experiments show (not surprisingly) that the more training data, the better is the success rate. The results also indicate that for inflective languages with 1000+ tags we have to develop a more sophisticated approach in order to get closer to an acceptable error rate. In order to compare two different approaches to text tagging---statistical and rule-based --- we modified Eric Brill's rule-based part of speech tagger and carried out two more experiments on the Czech data, obtaining similar results in terms of the error rate. We have also run three more experiments with greatly reduced tagset to get another comparison based on similar tagset size.