An algorithm for concurrency control and recovery in replicated distributed databases
ACM Transactions on Database Systems (TODS)
Data caching issues in an information retrieval system
ACM Transactions on Database Systems (TODS)
Providing high availability using lazy replication
ACM Transactions on Computer Systems (TOCS)
Bounded ignorance: a technique for increasing concurrency in a replicated system
ACM Transactions on Database Systems (TODS)
Supporting multiple view maintenance policies
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Applying the golden rule of sampling for query estimation
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Best-effort cache synchronization with source cooperation
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mercator: A scalable, extensible Web crawler
World Wide Web
A Query Sampling Method of Estimating Local Cost Parameters in a Multidatabase System
Proceedings of the Tenth International Conference on Data Engineering
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Performance and cost tradeoffs in Web search
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Modeling and Managing Content Changes in Text Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The infocious web search engine: improving web searching through linguistic analysis
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Proceedings of the 2nd international workshop on Information quality in information systems
Looking at both the present and the past to efficiently update replicas of web content
Proceedings of the 7th annual ACM international workshop on Web information and data management
Adaptive pull-based policies for wide area data delivery
ACM Transactions on Database Systems (TODS)
Temporal multi-page summarization
Web Intelligence and Agent Systems
Designing efficient sampling techniques to detect webpage updates
Proceedings of the 16th international conference on World Wide Web
Efficient Monitoring Algorithm for Fast News Alerts
IEEE Transactions on Knowledge and Data Engineering
Modeling and managing changes in text databases
ACM Transactions on Database Systems (TODS)
Designing clustering-based web crawling policies for search engine crawlers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Topical web crawling using weighted anchor text and web page change detection techniques
WSEAS Transactions on Information Science and Applications
SHARC: framework for quality-conscious web archiving
Proceedings of the VLDB Endowment
Foundations and Trends in Information Retrieval
Efficiently detecting webpage updates using samples
ICWE'07 Proceedings of the 7th international conference on Web engineering
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
The SHARC framework for data quality in Web archiving
The VLDB Journal — The International Journal on Very Large Data Bases
Best-effort refresh strategies for content-based RSS feed aggregation
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
State transfer graph: an efficient tool for webview maintenance
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Temporal ranking of search engine results
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
A hybrid approach for refreshing web page repositories
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Adaptive change estimation in the context of online market monitoring
EUROCAST'11 Proceedings of the 13th international conference on Computer Aided Systems Theory - Volume Part I
Predicting content change on the web
Proceedings of the sixth ACM international conference on Web search and data mining
Timely crawling of high-quality ephemeral new content
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
CUVIM: extracting fresh information from social network
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
A Hybrid Approach for Web Change Detection
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
For a large-scale data-intensive environment, such as the World-Wide Web or data warehousing, we often make local copies of remote data sources. Due to limited network and computational resources, however, it is often difficult to monitor the sources constantly to check for changes and to download changed data items to the copies. In this scenario, our goal is to detect as many changes as we can using the fixed download resources that we have. In this paper we propose three sampling-based download policies that can identify more changed data items effectively. In our sampling-based approach, we first sample a small number of data items from each data source and download more data items from the sources with more changed samples. We analyze the effectiveness of the sampling-based policies and compare our proposed policies to existing ones, including the state-of-the-art frequency-based policy in [8, 11]. Our experiments on synthetic and real-world data will show the relative merits of various policies and the great potential of our sampling-based policy. In certain cases, our sampling-based policy could download twice as many changed items as the best existing policy.