Practical NLP-Based Text Indexing

Authors:
Jesús Vilares Ferro;Francisco-Mario Barcala;Miguel A. Alonso;Jorge Graña Gil;Manuel Vilares Ferro
Affiliations:
-;-;-;-;-
Venue:
IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
Year:
2002

Citing 7
Cited 0

Statistical methods for speech recognition

Statistical methods for speech recognition
Tokenization and Proper Noun Recognition for Information Retrieval

DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
Applying Productive Derivational Morphology to Term Indexing of Spanish Texts

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Formal Methods of Tokenization for Part-of-Speech Tagging

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Using Syntactic Dependency-Pairs Conflation to Improve Retrieval Performance in Spanish

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Towards the Development of Heuristics for Automatic Query Expansion

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
A Common Solution for Tokenization and Part-of-Speech Tagging

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a set of natural language processing techniques based on finite-state technology that can be used to analyze huge amounts of texts. These techniques include an advanced tokenizer, a part-of-speech tagger that can manage ambiguous streams of words, a system for conflating words by means of derivational mechanisms, and a shallow parser to extract syntactic-dependency pairs. We propose to use these techniques in order to improve the performance of standard indexing engines.