Improving Web Clustering by Cluster Selection

Authors:
Daniel Crabtree;Xiaoying Gao;Peter Andreae
Affiliations:
Victoria University of Wellington;Victoria University of Wellington;Victoria University of Wellington
Venue:
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2005

Citing 0
Cited 14

Using symbolic objects to cluster web documents

Proceedings of the 15th international conference on World Wide Web
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

Integrated Computer-Aided Engineering
Constructing Web Corpora through Topical Web Partitioning for Term Recognition

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
QC4: a clustering evaluation method

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Exploratory web searching with dynamic taxonomies and results clustering

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A Clustering-Driven LDAP Framework

ACM Transactions on the Web (TWEB)
Clustering web search results with maximum spanning trees

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Automatically structuring domain knowledge from text: An overview of current research

Information Processing and Management: an International Journal
Improving suffix tree clustering with new ranking and similarity measures

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Result disambiguation in web people search

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web page clustering is a technology that puts semantically related web pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only textual information, Suffix Tree Clustering (STC) outperforms other clustering algorithms by making use of phrases and allowing clusters to overlap. One problem of STC and other similar algorithms is how to select a small set of clusters to display to the user from a very large set of generated clusters. The cluster selection method used in STC is flawed in that it does not handle overlapping clusters appropriately. This paper introduces a new cluster scoring function and a new cluster selection algorithm to overcome the problems with overlapping clusters, which are combined with STC to make a new clustering algorithm ESTC. This paperýs experiments show that ESTC significantly outperforms STC and that even with less data ESTC performs similarly to a commercial clustering search engine.