Query expansion with an automatically generated thesaurus

  • Authors:
  • José R. Pérez-Agüera;Lourdes Araujo

  • Affiliations:
  • Departamento de Sistemas Informáticos y Programación, Universidad Complutense de Madrid, Spain;Departamento de Sistemas Informáticos y Programación, Universidad Complutense de Madrid, Spain

  • Venue:
  • IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new method to automatically obtain a new thesaurus which exploits previously collected information. Our method relies on different resources, such as a text collection, a set of source thesauri and other linguistic resources. We have applied different techniques in the different phases of the process. By applying indexing techniques, the text collection provides the set of initial terms of interest for the new thesaurus. Then, these terms are searched in the source thesauri, providing the initial structure of the new thesaurus. Finally, the new thesaurus is enriched by searching for new relationships among its terms. These relationships are first detected using similarity measures and then are characterized with a type (equivalence, hierarchy or associativity) by using different linguistic resources. We have based the system evaluation on the results obtained with and without the thesaurus in an information retrieval task proposed by the Cross-Language Evaluation Forum (CLEF). The results of these experiments have revealed a clear improvement of the performance.