STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching

Authors:
Stella Kopidaki;Panagiotis Papadakos;Yannis Tzitzikas
Affiliations:
Institute of Computer Science, FORTH-ICS, Greece, Computer Science Department, University of Crete, Greece;Institute of Computer Science, FORTH-ICS, Greece, Computer Science Department, University of Crete, Greece;Institute of Computer Science, FORTH-ICS, Greece, Computer Science Department, University of Crete, Greece
Venue:
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Year:
2009

Citing 19
Cited 3

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Algorithms on Stings, Trees, and Sequences: Computer Science and Computational Biology

ACM SIGACT News
Use Link-Based Clustering to Improve Web Search Results

WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Improving Web Clustering by Cluster Selection

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
A New Web Search Result Clustering based on True Common Phrase Label Discovery

CIMCA '06 Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce
Deep classifier: automatically categorizing search results into large-scale hierarchies

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Web Search Results Clustering Based on a Novel Suffix Tree Structure

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Mitos: Design and Evaluation of a DBMS-Based Web Search Engine

PCI '08 Proceedings of the 2008 Panhellenic Conference on Informatics
FleXplorer: A Framework for Providing Faceted and Dynamic Taxonomy-Based Information Exploration

DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Automatic Extraction of Useful Facet Hierarchies from Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience

Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience
Carrot2 and language properties in web search results clustering

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Exploratory web searching with dynamic taxonomies and results clustering

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries

Exploiting available memory and disk for scalable instant overview search

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Scalable, flexible and generic instant overview search

Proceedings of the 21st international conference companion on World Wide Web
Interactive Exploration of Multi-Dimensional and Hierarchical Information Spaces with Real-Time Preference Elicitation

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving single-word or multiple-word names for the clusters (usually referred as cluster labeling ) is difficult, because they have to be syntactically correct and predictive. Moreover efficiency is an important requirement since results clustering is an online task. Suffix Tree Clustering (STC) is a clustering technique where search results (mainly snippets) can be clustered fast (in linear time), incrementally, and each cluster is labeled with a phrase. In this paper we introduce: (a) a variation of the STC, called STC+, with a scoring formula that favors phrases that occur in document titles and differs in the way base clusters are merged, and (b) a novel non merging algorithm called NM-STC that results in hierarchically organized clusters. The comparative user evaluation showed that both STC+ and NM-STC are significantly more preferred than STC, and that NM-STC is about two times faster than STC and STC+.