Representation and dimensionality reduction of semantically enriched clickstreams

  • Authors:
  • Tomáš Kliegr

  • Affiliations:
  • University of Economics, Czech Republic

  • Venue:
  • Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantically enriched web usage data have high dimensionality when represented as fixed-length vectors which impairs performance of many data mining algorithms as well as the comprehensibility for a human analyst. The work presented here introduces visitor profile as a set of low-dimensional fixed-length vectors extracted from clickstream of an individual visitor. The usability of this representation for common web usage mining tasks is demonstrated on association rule mining and clustering experiments. Since the availability of reliable pageview weights has been found of critical importance, a supervised algorithm based on genetic programming is proposed for learning these weights from clickstreams of converted visitors. Extraction of features from referring search engine queries is outlined for further work.