Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Detecting shifts in news stories for paragraph extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic Extraction of Publication Time from News Search Results
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Perception-oriented online news extraction
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Coreex: content extraction from online news articles
Proceedings of the 17th ACM conference on Information and knowledge management
News article extraction with template-independent wrapper
Proceedings of the 18th international conference on World wide web
Towards Automatic Construction of News Directory Systems
Proceedings of the 2008 conference on Information Modelling and Knowledge Bases XIX
A News Index System for Global Comparisons of Many Major Topics on the Earth
Proceedings of the 2009 conference on Information Modelling and Knowledge Bases XX
Personal News RSS Feeds Generation Using Existing News Feeds
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Hybrid method for automated news content extraction from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
A description-based composition method for mobile and tethered Mashup applications
Journal of Web Engineering
Hi-index | 0.00 |
Nowadays, the Web news article contents extraction is vital to provide news indexing and searching services. Most of the traditional methods need to analyze the layout of news pages to generate the wrappers manually or automatically. It is a costly work and needs much maintenance during the extraction over a long period of time. In this paper, we construct an automatic Web news article contents extraction system based on RSS feeds. We propose an effective and efficient algorithm to extract the news article contents from the news pages without the analysis of news sites before extraction. We calculate the relevance between the news title and each sentence in the news page to detect the news article contents. Our approach is applicable to the general types of news RSS feeds and independent of news page layout. Our experimental results show that our approach can extract the news article contents automatically, accurately and constantly.