Using Text Segmentation to Enhance the Cluster Hypothesis

  • Authors:
  • Sylvain Lamprier;Tassadit Amghar;Bernard Levrat;Frédéric Saubion

  • Affiliations:
  • LERIA - University of Angers, Angers, France 49000;LERIA - University of Angers, Angers, France 49000;LERIA - University of Angers, Angers, France 49000;LERIA - University of Angers, Angers, France 49000

  • Venue:
  • AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of text deviating from the interesting topic does not penalize the document. In this paper, we propose to study the impact of the consideration of these text fragments on a document clustering process. The use of clustering in the field of Information Retrieval is mainly supported by the cluster hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents and hence a clustering process is likely to gather them. Previous experiments have shown that clustering the first retrieved documents as response to a user's query allows the Information Retrieval systems to improve their effectiveness. In the clustering process used in these studies, documents have been considered globally. Nevertheless, the assumption stating that a document can refer to more than one topic/concept may have also impacts on the document clustering process. Considering passages of the retrieved documents separately may allow to create more representative clusters of the addressed topics. Different approaches have been assessed and results show that using text fragments in the clustering process may turn out to be actually relevant.