Blog post and comment extraction using information quantity of web format
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Web page publication time detection and its application for page rank
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An automatic web news article contents extraction system based on RSS feeds
Journal of Web Engineering
Hi-index | 0.00 |
The publication time of a page can have a big impact on its relevance to a query, especially for time-sensitive pages such as news items. For news search engines, the publication time of news items can usually be found in the returned search result records. In this paper, we introduce a method that can automatically extract the publication time for each news story returned from news search engines based on several important observations we made. We also introduce a wrapper implementation for the extraction method. The experimental results using data collected from 50 news search engine show that our method is effective and the wrapper implementation can not only improve the extraction accuracy but also the extraction efficiency.