Decision trees for web log mining

  • Authors:
  • Zidrina Pabarskaite

  • Affiliations:
  • (Correspd. Tel.: +370 5 21 09 341/ Fax: +370 5 27 29 209/ E-mail: zipa@softhome.net) Data Analysis Department, Institute of Mathematics and Informatics, Akademijos 4, Vilnius 2600, Lithuania

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Complex and extensive web sites are becoming more and more popular. Companies need to justify their investments. Web related data analysis is the way of providing this justification. It is usual that large amounts of data exist is the repositories and humans do not use. The reasons are simple. They don't know what to do with this data, how to prepare it and what kind of tasks should be performed to retrieve valuable knowledge. Commercial web mining packages do not answer all questions which maybe interesting to the data analyst. In this paper authors suggest several hypotheses what could help to improve web site's retention. The investigation proposes decision trees for web user behaviour analysis. This includes prediction of user future actions and the typical pages leading to browsing termination. Decision tree package C4.5 was used in this study. Decision trees showed reasonable computational performance and accuracy. Experiments showed that it is possible to predict future user actions with reasonable misclassification error as well as to find combinations of sequential pages resulting in browsing termination. In addition to this, decision trees generated human understandable rules which can be used to analyse further for web site improvement.