Automatic extraction of structure, content and usage data statistics of web sites

  • Authors:
  • Ioannis Paparrizos;Vassiliki Koutsonikola;Lefteris Angelis;Athena Vakali

  • Affiliations:
  • École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland;Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece

  • Venue:
  • Proceedings of the 21st ACM conference on Hypertext and hypermedia
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a web mining tool which automatically extracts the structure, content and usage data statistics of web sites. This work inspired by the fact that web mining consists of three axes: web structure mining, web content mining and web usage mining. Each one of those axes is using the structure, content and usage data respectively. The scope is to use the developed multi-thread web crawler as a tool to automatically extract from web pages data that are associated with each one of those three axes in order afterwards to compute several useful descriptive statistics and apply advanced mathematical and statistical methods. A description of our system is provided as well as some experimentation results.