Query expansion using domain-adapted, weighted thesaurus in an extended Boolean model

  • Authors:
  • Oh-Woog Kwon;Myoung-Cheol Kim;Key-Sun Choi

  • Affiliations:
  • Center for Artificial Intelligence Research, Department of Computer Science, Korea Advanced Institute of Science and Technology, 373-1, Kusung-dong, Yusung-gu, Taejon, 305-701, Korea;Center for Artificial Intelligence Research, Department of Computer Science, Korea Advanced Institute of Science and Technology, 373-1, Kusung-dong, Yusung-gu, Taejon, 305-701, Korea;Center for Artificial Intelligence Research, Department of Computer Science, Korea Advanced Institute of Science and Technology, 373-1, Kusung-dong, Yusung-gu, Taejon, 305-701, Korea

  • Venue:
  • CIKM '94 Proceedings of the third international conference on Information and knowledge management
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address there important issues with query expansion using a thesaurus; how to give weights to the terms in expanded queries, how to select additional search terms in the thesaurus, and how to enrich the terms in the manual thesaurus (namely, thesaurus reconstruction). To weight the terms in expanded queries, we construct the weighted thesaurus that has a similarity value between the terms in the thesaurus, using statistical co-occurrence in a corpus. To enrich the terms in the manual thesaurus, domain dependent terms which occur in a corpus are inserted into the weighted thesaurus using the co-occurrence information. In this paper, the reconstructed thesaurus with weights is defined as a domain-adapted, weighted thesaurus. Then we explain query expansion using the domain-adapted, weighted thesaurus in an extended Boolean retrieval model. To select additional search terms during query expansion, our model uses semi-automatic query expansion and a restriction method. In the experiments, our system had almost twice the recall of the boolean retrieval system not using the thesaurus or the query expansion retrieval system using the original thesaurus. And also, the precision of our system was almost the same precision as the other systems.