Statistical methods for speech recognition
Statistical methods for speech recognition
Tokenization and Proper Noun Recognition for Information Retrieval
DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
Applying Productive Derivational Morphology to Term Indexing of Spanish Texts
CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Formal Methods of Tokenization for Part-of-Speech Tagging
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Using Syntactic Dependency-Pairs Conflation to Improve Retrieval Performance in Spanish
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Towards the Development of Heuristics for Automatic Query Expansion
DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
A Common Solution for Tokenization and Part-of-Speech Tagging
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Hi-index | 0.00 |
We consider a set of natural language processing techniques based on finite-state technology that can be used to analyze huge amounts of texts. These techniques include an advanced tokenizer, a part-of-speech tagger that can manage ambiguous streams of words, a system for conflating words by means of derivational mechanisms, and a shallow parser to extract syntactic-dependency pairs. We propose to use these techniques in order to improve the performance of standard indexing engines.