Topic analysis of web user behavior using LDA model on proxy logs

  • Authors:
  • Hiroshi Fujimoto;Minoru Etoh;Akira Kinno;Yoshikazu Akinaga

  • Affiliations:
  • NTT DOCOMO R&D Center, Kanagawa, Japan;Osaka University Cybermedia Center, Osaka, Japan;NTT DOCOMO R&D Center, Kanagawa, Japan;NTT DOCOMO R&D Center, Kanagawa, Japan

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a web user profiling and clustering framework based on LDA-based topic modeling with an analogy to document analysis in which documents and words represent users and their actions. The main technical challenge addressed here is how to symbolize web access actions, by words, that are monitored through a web proxy. We develop a hierarchical URL dictionary generated from Yahoo! Directory and a cross-hierarchical matching method that provides the function of automatic abstraction. We apply the proposed framework to 7500 students in Osaka University. The results include, for example, 24 topics such as "Technology Oriented", "Job Hunting", and "SNS-addict." The results reflect the typical interest profiles of University students, while perplexity analysis is employed to confirm the optimality of the framework.