Personalization from incomplete data: what you don't know can hurt

Authors:
Balaji Padmanabhan;Zhiqiang Zheng;Steven O. Kimbrough
Affiliations:
The Wharton School, University of Pennsylvania, Philadelphia, PA;The Wharton School, University of Pennsylvania, Philadelphia, PA;The Wharton School, University of Pennsylvania, Philadelphia, PA
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 10
Cited 17

New metrics for new media: toward the development of Web measurement standards

World Wide Web Journal - Special issue on advancing HTML: style and substance
Adaptive Web sites: automatically synthesizing Web pages

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Using path profiles to predict HTTP requests

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Summary of WWW characterizations

WWW7 Proceedings of the seventh international conference on World Wide Web 7
User profiling in personalization applications through rule discovery and validation

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
E-metrics: tomorrow's business metrics today (invited talk) (abstract only)

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Enabling scalable online personalization on the Web

Proceedings of the 2nd ACM conference on Electronic commerce
Data Mining Your Website

Data Mining Your Website
Online Generation of Association Rules

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Adaptive web sites: an AI challenge

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1

Client-side monitoring for web mining

Journal of the American Society for Information Science and Technology
The Role of the Management Sciences in Research on Personalization

Management Science
On the Use of Optimization for Data Mining: Theoretical Interactions and eCRM Opportunities

Management Science
Avaliação comparativa de algoritmos de personalização para direcionamento de conteúdo

CLIHC '05 Proceedings of the 2005 Latin American conference on Human-computer interaction
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce

Proceedings of the ninth international conference on Electronic commerce
Designing evolving user profile in e-CRM with dynamic clustering of Web documents

Data & Knowledge Engineering
The dynamics of personal territories on the web

Proceedings of the 20th ACM conference on Hypertext and hypermedia
The dynamics of personal territories on the web

ACM SIGWEB Newsletter
Creating User Profiles Using Wikipedia

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Learning user purchase intent from user-centric data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Resource Allocation Policies for Personalization in Content Delivery Sites

Information Systems Research
An empirical analysis of the value of complete information for ECRM models

MIS Quarterly
Modeling Consumer Purchasing Behavior in Social Shopping Communities with Clickstream Data

International Journal of Electronic Commerce
Addressing users' privacy concerns for improving personalization quality: towards an integration of user studies and algorithm evaluation

ITWP'03 Proceedings of the 2003 international conference on Intelligent Techniques for Web Personalization
Customer relationship management and Web mining: the next frontier

Data Mining and Knowledge Discovery
Effect of user-generated content on website stickiness: the case of social shopping communities

Proceedings of the 14th Annual International Conference on Electronic Commerce
Goal attainment on long tail web sites: An information foraging approach

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clickstream data collected at any web site (site-centric data) is inherently incomplete, since it does not capture users' browsing behavior across sites (user-centric data). Hence, models learned from such data may be subject to limitations, the nature of which has not been well studied. Understanding the limitations is particularly important since most current personalization techniques are based on site-centric data only. In this paper, we empirically examine the implications of learning from incomplete data in the context of two specific problems: (a) predicting if the remainder of any given session will result in a purchase and (b) predicting if a given user will make a purchase at any future session. For each of these problems we present new algorithms for fast and accurate data preprocessing of clickstream data. Based on a comprehensive experiment on user-level clickstream data gathered from 20,000 users' browsing behavior, we demonstrate that models built on user-centric data outperform models built on site-centric data for both prediction tasks.