Session boundary detection for association rule learning using n-gram language models

Authors:
Xiangji Huang;Fuchun Peng;Aijun An;Dale Schuurmans;Nick Cercone
Affiliations:
School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;Department of Computer Science, York University, Toronto, Ontario, Canada;School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
Venue:
AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Year:
2003

Citing 6
Cited 0

Characterizing browsing strategies in the World-Wide Web

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Interesting Association Rules from Livelink Web Log Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Combining naive bayes and n-gram language models for text classification

ECIR'03 Proceedings of the 25th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a statistical method using n-gram language models to identify session boundaries in a large collection of Livelink log data. The identified sessions are then used for association rule learning. Unlike the traditional ad hoc timeout method, which uses fixed time thresholds for session identification, our method uses an information theoretic approach that provides a natural technique for performing dynamic session identification. The effectiveness of our approach is evaluated with respect to 4 different interestingness measures. We find that we obtain a significant improvement in each interestingness measure, ranging from a 26.6% to 39% improvement on average over the best results obtained with standard timeout methods.