A language for manipulating clustered web documents results

  • Authors:
  • Gloria Bordogna;Alessandro Campi;Giuseppe Psaila;Stefania Ronchi

  • Affiliations:
  • CNR, Dalmine (BG), Italy;Politecnico di Milano, Milano, Italy;Università di Bergamo, Dalmine (BG), Italy;Università di Bergamo, Dalmine (BG), Italy

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel conception language for exploring the results retrieved by several internet search services (like search engines) that cluster retrieved documents. The goal is to offer users a tool to discover relevant hidden relationships between clustered documents. The proposal is motivated by the observation that visualization paradigms, based on either the ranked list or clustered results, do not allow users to fully exploit the combined use of several search services to answer a request. When the same query is submitted to distinct search services, they may produce partially overlapped clustered results, where clusters identified by distinct labels collect some common documents. Moreover, clusters with similar labels, but containing distinct documents, may be produced as well. In such a situation, it may be useful to compare, combine and rank the cluster contents, to filter out relevant documents. In the proposed language, we define several operators (inspired by relational algebra) that work on groups of clusters. New clusters (and groups) can be generated by combining (i.e., overlapping, refining and intersecting) clusters (and groups), in a set oriented fashion. Furthermore, several ranking functions are also proposed, to model distinct semantics of the combination.