Conceptual indexing based on document content representation

  • Authors:
  • Mustapha Baziz;Mohand Boughanem;Nathalie Aussenac-Gilles

  • Affiliations:
  • IRIT, Toulouse Cedex 4, France;IRIT, Toulouse Cedex 4, France;IRIT, Toulouse Cedex 4, France

  • Venue:
  • CoLIS'05 Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses an important problem related to the use of semantics in IR. It concerns the representation of document semantics and its proper use in retrieval. The approach we propose aims at representing the content of the document by the best semantic network called document semantic core in two main steps. During the first step concepts (words and phrases) are extracted from a document, driven by an external general-purpose ontology, namely WordNet. The second step a global disambiguation of the extracted concepts regarding to the document leads to build the best semantic network. Thus, the selected concepts represent the nodes of the semantic network whereas similarity measure values between connected nodes weight the links. The resulting scored concepts are used for the document conceptual indexing in Information Retrieval.