Using semantic distance in a content-based heterogeneous information retrieval system

  • Authors:
  • Ahmad El Sayed;Hakim Hacid;Djamel Zighed

  • Affiliations:
  • University of Lyon 2, ERIC Laboratory, Bron cedex, France;University of Lyon 2, ERIC Laboratory, Bron cedex, France;University of Lyon 2, ERIC Laboratory, Bron cedex, France

  • Venue:
  • MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper brings two contributions in relation with the semantic heterogeneous (documents composed of texts and images) information retrieval: (1) A new context-based semantic distance measure for textual data, and (2) an IR system providing a conceptual and an automatic indexing of documents by considering their heterogeneous content using a domain specific ontology. The proposed semantic distance measure is used in order to automatically fuzzify our domain ontology. The two proposals are evaluated and very interesting results were obtained. Using our semantic distance measure, we obtained a correlation ratio of 0.89 with human judgments on a set of words pairs which led our measure to outperform all the other measures. Preliminary combination results obtained on a specialized corpus of web pages are also reported.