Search Results Clustering Based on Suffix Array and VSM

Authors:
Shunlai Bai;Wenhao Zhu;Bofeng Zhang;Jianhua Ma
Affiliations:
-;-;-;-
Venue:
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Year:
2010

Citing 5
Cited 1

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
MARSYAS: a framework for audio analysis

Organised Sound
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Context-aware query classification

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Association rule centric clustering of web search results

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid growth of web pages, search engines will usually present a long ranked list of documents. The users must sift through the list with "title" and "snippet" (a short description of the document) to find the desired document. This method may be good for some simple and specific tasks but less effective and efficient for ambiguous queries such as "apple" or "jaguar". To improve the effect and efficiency of information retrieval, an alternative method is to automatically organize retrieval results into clusters. This paper presents an improved Lingo algorithm named Suffix Array Similarity Clustering (SASC) for clustering web search results. This method creates the clusters by adopting improved suffix array, which ignores the redundant suffixes, and computing document similarity based on the title and short document snippets returned by Web search engines. Experiments show that the SASC algorithm has not only a better performance in time-consuming than Lingo but also in cluster description quality and precision than Suffix Tree Clustering.