Mining Local Buffer Data

  • Authors:
  • Andrzej Siemiński

  • Affiliations:
  • Wrocław University of Technology, Institute of Applied Informatics, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland, e-mail: andrzej.sieminski@pwr.wroc.pl

  • Venue:
  • Proceedings of the 2008 conference on New Trends in Multimedia and Network Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web mining employs the techniques of data mining to extract information from the Web for a variety of purposes. The usual sources of data are the log files of WWW or proxy servers. The paper examines the possibility of using the local browser buffer for that purpose. The data that could be extracted from both types of logs are compared. It turns out, that despite its limitations the browser buffer is a rich source of unique data about user navigational habits and the properties of the fragment of the WWW that he/she visits. The cache contains the both the full body of a WWW object as well as the header control data sent by the server. Additionally the cache includes some basic information about the usage pattern of each object. Therefore it is possible to study the susceptibility to buffering the objects which is measured by the CF (cacheability factor) and to study the word diversity of Internet texts seen by the user. The CF factor provides an objective measure of the web site caching potential and thus makes it possible to infer about latency of the web site. The word diversity study tests the compliance of the Internet texts with the well known Zipf and Heaps Laws' that are valid for all natural languages. That part study could be used for the optimization indexing engines or the recommendation of pages potentially interesting for the user.