The anatomy of SnakeT: a hierarchical clustering engine for web-page snippets

Authors:
Paolo Ferragina;Antonio Gullì
Affiliations:
Università di Pisa;Università di Pisa
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 3

A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Comprehensible and accurate cluster labels in text clustering

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Association rule centric clustering of web search results

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of a search engine is to retrieve from a given textual collection the documents deemed relevant for a user query. Typically a user query is modeled as a set of keywords, and a document is a Web page, a pdf file or whichever file can be parsed into a set of tokens (words). Documents are ranked in a flat list according to some measure of relevance to the user query. That list contains hyperlinks to the relevant documents, their titles, and also the so called (page or web) snippets, namely document excerpts allowing the user to understand if a document is indeed relevant without accessing it.