An Empirical Study on Keyword-based Web Site Clustering

  • Authors:
  • Affiliations:
  • Venue:
  • IWPC '04 Proceedings of the 12th IEEE International Workshop on Program Comprehension
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web site evolution is characterized by a limited supportto the understanding activities offered to the developers. Infact, design diagrams are often missing or outdated. A potentiallyinteresting option is to reverse engineer high levelviews of Web sites from the content of the Web pages. Clusteringis a valuable technique that can be used in this respect.Web pages can be clustered together based on thesimilarity of summary information about their content, representedas a list of automatically extracted keywords.This paper presents an empirical study that was conductedto determine the meaningfulness for Web developersof clusters automatically produced from the analysis ofthe Web page content. Natural Language Processing (NLP)plays a central role in content analysis and keyword extraction.Thus, a second objective of the study was to assess thecontribution of some shallow NLP techniques to the clusteringtask.