WISE: Hierarchical Soft Clustering of Web Page Search Results Based on Web Content Mining Techniques

  • Authors:
  • Ricardo Campos;Gael Dias;Celia Nunes

  • Affiliations:
  • University of Beira Interior, Portugal;University of Beira Interior, Portugal;University of Beira Interior, Portugal

  • Venue:
  • WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of web pages search results. In particular, we propose an architecture called WISE [1], a meta-search engine that automatically builds clusters of related web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding web documents. The system which is a web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the pre-selection of the retrieved web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters.