Using symbolic objects to cluster web documents
Proceedings of the 15th international conference on World Wide Web
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
Integrated Computer-Aided Engineering
Constructing Web Corpora through Topical Web Partitioning for Term Recognition
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
QC4: a clustering evaluation method
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Exploratory web searching with dynamic taxonomies and results clustering
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Inducing word senses to improve web search result clustering
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A Clustering-Driven LDAP Framework
ACM Transactions on the Web (TWEB)
Clustering web search results with maximum spanning trees
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Automatically structuring domain knowledge from text: An overview of current research
Information Processing and Management: an International Journal
Improving suffix tree clustering with new ranking and similarity measures
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Result disambiguation in web people search
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
Web page clustering is a technology that puts semantically related web pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only textual information, Suffix Tree Clustering (STC) outperforms other clustering algorithms by making use of phrases and allowing clusters to overlap. One problem of STC and other similar algorithms is how to select a small set of clusters to display to the user from a very large set of generated clusters. The cluster selection method used in STC is flawed in that it does not handle overlapping clusters appropriately. This paper introduces a new cluster scoring function and a new cluster selection algorithm to overcome the problems with overlapping clusters, which are combined with STC to make a new clustering algorithm ESTC. This paperýs experiments show that ESTC significantly outperforms STC and that even with less data ESTC performs similarly to a commercial clustering search engine.