Search Results Clustering Based on Suffix Array and VSM

  • Authors:
  • Shunlai Bai;Wenhao Zhu;Bofeng Zhang;Jianhua Ma

  • Affiliations:
  • -;-;-;-

  • Venue:
  • GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid growth of web pages, search engines will usually present a long ranked list of documents. The users must sift through the list with "title" and "snippet" (a short description of the document) to find the desired document. This method may be good for some simple and specific tasks but less effective and efficient for ambiguous queries such as "apple" or "jaguar". To improve the effect and efficiency of information retrieval, an alternative method is to automatically organize retrieval results into clusters. This paper presents an improved Lingo algorithm named Suffix Array Similarity Clustering (SASC) for clustering web search results. This method creates the clusters by adopting improved suffix array, which ignores the redundant suffixes, and computing document similarity based on the title and short document snippets returned by Web search engines. Experiments show that the SASC algorithm has not only a better performance in time-consuming than Lingo but also in cluster description quality and precision than Suffix Tree Clustering.