Resource allocation problems: algorithmic approaches
Resource allocation problems: algorithmic approaches
Life, death, and lawfulness on the electronic frontier
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
On the scale and performance of cooperative Web proxy caching
Proceedings of the seventeenth ACM symposium on Operating systems principles
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
WebCQ-detecting and delivering information changes on the web
Proceedings of the ninth international conference on Information and knowledge management
Adaptive precision setting for cached approximate values
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler
World Wide Web
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Characteristics of WWW Client-based Traces
Characteristics of WWW Client-based Traces
Web-CAM: monitoring the dynamic Web to respond to continual queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Report of Activities at the WIC-India Research Center
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Adaptive pull-based policies for wide area data delivery
ACM Transactions on Database Systems (TODS)
Temporal multi-page summarization
Web Intelligence and Agent Systems
Answering bounded continuous search queries in the world wide web
Proceedings of the 16th international conference on World Wide Web
Efficient Monitoring Algorithm for Fast News Alerts
IEEE Transactions on Knowledge and Data Engineering
WIC: a general-purpose algorithm for monitoring web information sources
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A Hierarchy of Twofold Resource Allocation Automata Supporting Optimal Web Polling
IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
A Web data extraction approach to harvesting data from online sources
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
A Hierarchy of Twofold Resource Allocation Automata Supporting Optimal Sampling
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Using Knowledge Base for Event-Driven Scheduling of Web Monitoring Systems
EC-Web 2009 Proceedings of the 10th International Conference on E-Commerce and Web Technologies
Foundations and Trends in Information Retrieval
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Best-effort refresh strategies for content-based RSS feed aggregation
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Engineering Applications of Artificial Intelligence
Decomposition-Based optimization of reload strategies in the world wide web
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Ten theses on logic languages for the semantic web
PPSWR'05 Proceedings of the Third international conference on Principles and Practice of Semantic Web Reasoning
Key element-context model: an approach to efficient web metadata maintenance
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Tasklets: enabling end user programming of web widgets
International Journal of Web Engineering and Technology
Hi-index | 0.00 |
Continuous queries are queries for which responses given to users must be continuously updated, as the sources of interest get updated. Such queries occur, for instance, during on-line decision making, e.g., traffic flow control, weather monitoring, etc. The problem of keeping the responses current reduces to the problem of deciding how often to visit a source to determine if and how it has been modified, in order to update earlier responses accordingly. On the surface, this seems to be similar to the crawling problem since crawlers attempt to keep indexes up-to-date as pages change and users pose search queries. We show that this is not the case, both due to the inherent differences between the nature of the two problems as well as the performance metric. We propose, develop and evaluate a novel multi-phase (Continuous Adaptive Monitoring) (CAM) solution to the problem of maintaining the currency of query results. Some of the important phases are: The tracking phase, in which changes, to an initially identified set of relevant pages, are tracked. From the observed change characteristics of these pages, a probabilistic model of their change behavior is formulated and weights are assigned to pages to denote their importance for the current queries. During the next phase, the resource allocation phase, based on these statistics, resources, needed to continuously monitor these pages for changes, are allocated. Given these resource allocations, the scheduling phase produces an optimal achievable schedule for the monitoring tasks. An experimental evaluation of our approach compared to prior approaches for crawling dynamic web pages shows the effectiveness of CAM for monitoring dynamic changes. For example, by monitoring just 5% of the page changes, CAM is able to return 90% of the changed information to the users. The experiments also produce some interesting observations pertaining to the differences between the two problems of crawling--to build an index--and the problem of change tracking--to respond to continuous queries.