An overview of web data clustering practices

  • Authors:
  • Athena Vakali;Jaroslav Pokorný;Theodore Dalamagas

  • Affiliations:
  • Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Faculty of Mathematics and Physics, Charles University, Praha 1, Czech Republic;School of Electr and Comp Engineering, National Technical University of Athens, Zographou, Athens, Greece

  • Venue:
  • EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a challenging topic in the area of Web data management Various forms of clustering are required in a wide range of applications, including finding mirrored Web pages, detecting copyright violations, and reporting search results in a structured way Clustering can either be performed once offline, (independently to search queries), or online (on the results of search queries) Important efforts have focused on mining Web access logs and to cluster search engine results on the fly Online methods based on link structure and text have been applied successfully to finding pages on related topics This paper presents an overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources and presents a survey about current status and future trends in clustering employed over the Web.