Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Algorithms for the Longest Common Subsequence Problem
Journal of the ACM (JACM)
Collaborative document monitoring
GROUP '01 Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Brief Introduction to Boosting
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
The site browser: catalyzing improvements in hypertext organization
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
BizCQ: using continual queries to cope with changes in business information exchange
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
The eShopmonitor: a comprehensive data extraction tool for monitoring web sites
IBM Journal of Research and Development
Journey to the past: proposal of a framework for past web browser
Proceedings of the seventeenth conference on Hypertext and hypermedia
ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
Temporal multi-page summarization
Web Intelligence and Agent Systems
Homepage live: automatic block tracing for web personalization
Proceedings of the 16th international conference on World Wide Web
Zoetrope: interacting with the ephemeral web
Proceedings of the 21st annual ACM symposium on User interface software and technology
Temporal ranking of search engine results
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Personalized detection of fresh content and temporal annotation for improved page revisiting
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
This paper presents a new challenge for Web monitoring tools: to build a system that can monitor entire web sites effectively. Such a system could potentially be used to discover "silent news" hidden within corporate web sites. Examples of silent news include reorganizations in the executive team of a company or in the retirement of a product line. ChangeDetector, an implemented prototype, addresses this challenge by incorporating a number of machine learning techniques. The principal backend components of ChangeDetector all rely on machine learning: intelligent crawling, page classification and entity-based change detection. Intelligent crawling enables ChangeDetector to selectively crawl the most relevant pages of very large sites. Classification allows change detection to be filtered by topic. Entity extraction over changed pages permits change detection to be filtered by semantic concepts, such as person names, dates, addresses, and phone numbers. Finally, the front end presents a flexible way for subscribers to interact with the database of detected changes to pinpoint those changes most likely to be of interest.