An Empirical Study on Keyword-based Web Site Clustering

Authors:
Affiliations:
Venue:
IWPC '04 Proceedings of the 12th IEEE International Workshop on Program Comprehension
Year:
2004

Citing 0
Cited 4

Improving Web site understanding with keyword-based clustering

Journal of Software Maintenance and Evolution: Research and Practice
KX: A flexible system for keyphrase extraction

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Identifying cloned navigational patterns in web applications

Journal of Web Engineering
Identifying website communities in mobile internet based on affinity measurement

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web site evolution is characterized by a limited supportto the understanding activities offered to the developers. Infact, design diagrams are often missing or outdated. A potentiallyinteresting option is to reverse engineer high levelviews of Web sites from the content of the Web pages. Clusteringis a valuable technique that can be used in this respect.Web pages can be clustered together based on thesimilarity of summary information about their content, representedas a list of automatically extracted keywords.This paper presents an empirical study that was conductedto determine the meaningfulness for Web developersof clusters automatically produced from the analysis ofthe Web page content. Natural Language Processing (NLP)plays a central role in content analysis and keyword extraction.Thus, a second objective of the study was to assess thecontribution of some shallow NLP techniques to the clusteringtask.