STARTS: Stanford proposal for Internet meta-searching
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Statistical methods for speech recognition
Statistical methods for speech recognition
Towards a better understanding of Web resources and server responses for improved caching
WWW '99 Proceedings of the eighth international conference on World Wide Web
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Best-effort cache synchronization with source cooperation
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Keeping Up with the Changing Web
Computer
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Modeling and Managing Content Changes in Text Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
WWW '05 Proceedings of the 14th international conference on World Wide Web
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
PISA: A framework for integrating uncooperative peers into P2P-based federated search
Computer Communications
DSNotify - A solution for event detection and link maintenance in dynamic datasets
Web Semantics: Science, Services and Agents on the World Wide Web
Monitoring User Evolution in Twitter
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Hi-index | 0.00 |
Large amounts of (often valuable) information are stored in web-accessible text databases. “Metasearchers” provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not evolve over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this article, we first report the results of a study showing how the content summaries of 152 real web databases evolved over a period of 52 weeks. Then, we show how to use “survival analysis” techniques in general, and Cox's proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the quality of our schedules experimentally over real web databases.