Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
MARSYAS: a framework for audio analysis
Organised Sound
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
Context-aware query classification
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Association rule centric clustering of web search results
MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Hi-index | 0.00 |
With the rapid growth of web pages, search engines will usually present a long ranked list of documents. The users must sift through the list with "title" and "snippet" (a short description of the document) to find the desired document. This method may be good for some simple and specific tasks but less effective and efficient for ambiguous queries such as "apple" or "jaguar". To improve the effect and efficiency of information retrieval, an alternative method is to automatically organize retrieval results into clusters. This paper presents an improved Lingo algorithm named Suffix Array Similarity Clustering (SASC) for clustering web search results. This method creates the clusters by adopting improved suffix array, which ignores the redundant suffixes, and computing document similarity based on the title and short document snippets returned by Web search engines. Experiments show that the SASC algorithm has not only a better performance in time-consuming than Lingo but also in cluster description quality and precision than Suffix Tree Clustering.