Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Data mining: concepts and techniques
Data mining: concepts and techniques
ACM SIGMETRICS Performance Evaluation Review
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
WWW '05 Proceedings of the 14th international conference on World Wide Web
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Designing clustering-based web crawling policies for search engine crawlers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Foundations and Trends in Information Retrieval
Efficiently detecting webpage updates using samples
ICWE'07 Proceedings of the 7th international conference on Web engineering
News page discovery policy for instant crawlers
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
Predicting content change on the web
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes.Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit.In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly.Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.