Machine Learning
Scaling up dynamic time warping for datamining applications
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Advances in Large Margin Classifiers
Advances in Large Margin Classifiers
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
WWW '05 Proceedings of the 14th international conference on World Wide Web
Looking at both the present and the past to efficiently update replicas of web content
Proceedings of the 7th annual ACM international workshop on Web information and data management
Web projections: learning from contextual subgraphs of the web
Proceedings of the 16th international conference on World Wide Web
Predicting clicks: estimating the click-through rate for new ads
Proceedings of the 16th international conference on World Wide Web
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The web changes everything: understanding the dynamics of web content
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Foundations and Trends in Information Retrieval
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
Exploiting statistical and relational information on the web and in social media
Proceedings of the fourth ACM international conference on Web search and data mining
A word at a time: computing word relatedness using temporal semantic analysis
Proceedings of the 20th international conference on World wide web
Hi-index | 0.00 |
Accurate prediction of changing web page content improves a variety of retrieval and web related components. For example, given such a prediction algorithm one can both design a better crawling strategy that only recrawls pages when necessary as well as a proactive mechanism for personalization that pushes content associated with user revisitation directly to the user. While many techniques for modeling change have focused simply on past change frequency, our work goes beyond that by additionally studying the usefulness in page change prediction of: the page's content; the degree and relationship among the prediction page's observed changes; the relatedness to other pages and the similarity in the types of changes they undergo. We present an expert prediction framework that incorporates the information from these other signals more effectively than standard ensemble or basic relational learning techniques. In an empirical analysis, we find that using page content as well as related pages significantly improves prediction accuracy and compare it to common approaches. We present numerous similarity metrics to identify related pages and focus specifically on measures of temporal content similarity. We observe that the different metrics yield related pages that are qualitatively different in nature and have different effects on the prediction performance.