Document centered approach to text normalization
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Practical NLP-Based Text Indexing
IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
Extraction of complex index terms in non-English IR: A shallow parsing based approach
Information Processing and Management: an International Journal
Current research issues and trends in non-English Web searching
Information Retrieval
XML rules for enclitic segmentation
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Towards the automatic learning of idiomatic prepositional phrases
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
COLE experiments at QA@CLEF 2004 spanish monolingual track
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Hi-index | 0.00 |
One of the most important prior tasks for robust part-of-speech tagging is the correct tokenization or segmentation of the texts. This task can involve processes which are much more complex than the simple identification of the different sentences in the text and each of their individual components, but it is often obviated in many current applications.Nevertheless, this preprocessing step is an indispensable task in practice, and it is particularly difficult to tackle it with scientific precision without falling repeatedly in the analysis of the specific casuistry of every phenomenon detected.In this work, we have developed a scheme of preprocessing oriented towards the disambiguation and robust tagging of Galician. Nevertheless, it is a proposal of a general architecture that can be applied to other languages, such as Spanish, with very slight modifications.