Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy

Authors:
Alexander F. Gelbukh
Affiliations:
-
Venue:
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Year:
2000

Citing 6
Cited 1

Algorithms for finding patterns in strings

Handbook of theoretical computer science (vol. A)
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Use of a Weighted Topic Hierarchy for Document Classification

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue

Relational Data Model in Document Hierarchical Indexing

PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A full-text information retrieval system has to deal with various phenomena of string equivalence: ignore case matching, morphological inflection, derivation, synonymy, and hyponymy or hyperonymy. Technically, this can be handled either at the time of indexing by reducing equivalent strings to a common form or at the time of query processing by enriching the query with the whole set of the equivalent forms. We argue for that the latter way allows for greater flexibility and easier maintenance, while being more affordable than it is usually considered. Our proposal consists in enriching the query only with those forms that really appear in the document base. Our experiments with a thesaurus-based information retrieval system showed only insignificant increase of the query size on average with a 200-megabyte document base, even with highly inflective Spanish language.