Handling Vagueness in Information Retrieval Systems

  • Authors:
  • Gloria Bordogna;Gabriella Pasi

  • Affiliations:
  • -;-

  • Venue:
  • ANNES '95 Proceedings of the 2nd New Zealand Two-Stream International Conference on Artificial Neural Networks and Expert Systems
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years a branch of research in Information Retrieval has faced the problem of modeling the vagueness which invariably characterizes the management of information. Some approaches have coped with this problem by trying to directly process the natural language to some extent; however, their main limitation consists in their range of applicability: an interpretation of the documents' meaning needs a too large number of decision rules even in narrow application areas. A second class of approaches is more general: its objective is to define retrieval models which deal with vagueness in information management independently on the application field. In particular, a large part of research in this area has faced the problem of extending the Boolean information retrieval model with a twofold aim: to incorporate in IR systems more specific and accurate representations of documents' contents and to make the query language more expressive and natural than the usual Boolean language. In both cases, vagueness and subjectivity are two salient characteristics affecting the information retrieval activity. Fuzzy set theory is a formal framework well suited to model vagueness: in IR it has been successfully employed for the definition of a "superstructure" of the Boolean model, with the appealing consequence that existing Boolean IRSs can be improved without redesigning them completely. In this paper, some fuzzy extensions of IRSs are presented: first the incorporation of index term weights in the representation of documents' content is presented as a first aid to soften the retrieval activity . Second, query term weights with different semantics are discussed. A large part of the paper will describe the fuzzy approaches which introduce linguistic features into IR models. Linguistic extensions have been defined in order to allow: - the specification of subjective criteria for interpreting the content of documents; - the specification of different importances of the query terms in qualifying the desired documents, by the use of linguistic values associated with each query term, such as important, very important, etc.; - the specification of qualitative aggregation criteria of the search terms, lying between the AND and the OR and identified by quantifiers such as al least k, almost k, more than k, etc.. - the specification of optional selection criteria with respect to essential ones, by the definition of the operator and possibly. The notions of linguistic variable, OWA operator on criteria with unequal importances, and non-monotonic intersection defined in have been employed.