Combining vector space model and multi word term extraction for semantic query expansion

  • Authors:
  • Eric SanJuan;Fidelia Ibekwe-SanJuan;Juan-Manuel Torres-Moreno;Patricia Velázquez-Morales

  • Affiliations:
  • Laboratoire Informatique d'Avignon UAPV, Avignon Cedex 9, France;ELICO, Université de Lyon 3. Lyon Cedex, France;Laboratoire Informatique d'Avignon UAPV, Avignon Cedex 9 and ELICO, Université de Lyon 3. Lyon Cedex, France;École Polytechnique/DGI Montréal, Canada

  • Venue:
  • NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we target document ranking in a highly technical field with the aim to approximate a ranking that is obtained through an existing ontology (knowledge structure). We test and combine symbolic and vector space models (VSM). Our symbolic approach relies on shallow NLP and on internal linguistic relations between Multi-Word Terms (MWTs). Documents are ranked based on different semantic relations they share with the query terms, either directly or indirectly after clustering the MWTs using the identified lexico-semantic relations. The VSM approach consisted in ranking documents with different functions ranging from the classical tf.idf to more elaborate similarity functions. Results shows that the ranking obtained by the symbolic approach performs better on most queries than the vector space model. However, the ranking obtained by combining both approaches outperforms by a wide margin the results obtained by methods from each approach.