A data cube model for prediction-based web prefetching

Authors:
Qiang Yang;Joshua Zhexue Huang;Michael Ng
Affiliations:
Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong;E-Business Technology Institute, The University of Hong Kong, Hong Kong;Department of Mathematics, The University of Hong Kong, Hong Kong
Venue:
Journal of Intelligent Information Systems - Special issue on web intelligence
Year:
2003

Citing 13
Cited 11

Algorithms for clustering data

Algorithms for clustering data
A caching relay for the World Wide Web

Selected papers of the first conference on World-Wide Web
Web server workload characterization: the search for invariants

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using predictive prefetching to improve World Wide Web latency

ACM SIGCOMM Computer Communication Review
Removal policies in network caches for World-Wide Web documents

Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Proxy caching that estimates page load delays

Selected papers from the sixth international conference on World Wide Web
The data webhouse toolkit: building the web-enabled data warehouse

The data webhouse toolkit: building the web-enabled data warehouse
Characterizing reference locality in the WWW

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs

ADL '98 Proceedings of the Advances in Digital Libraries Conference
Characteristics of WWW Client-based Traces

Characteristics of WWW Client-based Traces
Prefetching hyperlinks

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

Popularity-Based Selective Markov Model

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
A user-focused evaluation of web prefetching algorithms

Computer Communications
Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm

Data & Knowledge Engineering
Using common Lisp to prototype offline work in web applications for rich domains

Proceedings of the 6th European Lisp Workshop
A graph-based optimization algorithm for website topology using interesting association rules

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Referrer graph: a low-cost web prediction algorithm

Proceedings of the 2010 ACM Symposium on Applied Computing
Decay in Functions of Multiband Matrices

SIAM Journal on Matrix Analysis and Applications
Short Survey: A taxonomy of web prediction algorithms

Expert Systems with Applications: An International Journal
Intelligent Naïve Bayes-based approaches for Web proxy caching

Knowledge-Based Systems
A comparison of prediction algorithms for prefetching in the current web

Journal of Web Engineering
Intelligent Web proxy caching approaches based on machine learning techniques

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reducing the web latency is one of the primary concerns of Internet research. Web caching and web prefetching are two effective techniques to latency reduction. A primary method for intelligent prefetching is to rank potential web documents based on prediction models that are trained on the past web server and proxy server log data, and to prefetch the highly ranked objects. For this method to work well, the prediction model must be updated constantly, and different queries must be answered efficiently. In this paper we present a data-cube model to represent Web access sessions for data mining for supporting the prediction model construction. The cube model organizes session data into three dimensions. With the data cube in place, we apply efficient data mining algorithms for clustering and correlation analysis. As a result of the analysis, the web page clusters can then be used to guide the prefetching system. In this paper, we propose an integrated web-caching and web-prefetching model, where the issues of prefetching aggressiveness, replacement policy and increased network traffic are addressed together in an integrated framework. The core of our integrated solution is a prediction model based on statistical correlation between web objects. This model can be frequently updated by querying the data cube of web server logs. This integrated data cube and prediction based prefetching framework represents a first such effort in our knowledge.