Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Detecting shifts in news stories for paragraph extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrappers manually or automatically. In this paper, we propose a relevance-based analysis method to extract the news article contents from the news pages without the analysis of news page layouts before extraction. This method is applicable to the general news pages and we give the implementations of news extraction from different kinds of news sources.