STARTS: Stanford proposal for Internet meta-searching
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Statistical methods for speech recognition
Statistical methods for speech recognition
Towards a better understanding of Web resources and server responses for improved caching
WWW '99 Proceedings of the eighth international conference on World Wide Web
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Best-effort cache synchronization with source cooperation
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Keeping Up with the Changing Web
Computer
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Characterizing Web Document Change
WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Distributed query sampling: a quality-conscious approach
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
XML structural delta mining: issues and challenges
Data & Knowledge Engineering - Special issue: ER 2003
Modeling and managing changes in text databases
ACM Transactions on Database Systems (TODS)
Updating collection representations for federated search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques
IEEE Transactions on Parallel and Distributed Systems
A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Foundations and Trends in Information Retrieval
Summarizing cluster evolution in dynamic environments
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
FINGERPRINT: Summarizing Cluster Evolution in Dynamic Environments
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
Large amounts of (often valuable) information are stored in web-accessible text databases. "Metasearchers" provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not need to change over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this paper, we first report the results of a study showing how the content summaries of 152 real web databases evolved over a period of 52 weeks. Then, we show how to use "survival analysis" techniques in general, and Coxýs proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the quality of our schedules experimentally over real web databases.