RSS feed generation from legacy HTML pages

  • Authors:
  • Jun Wang;Kanji Uchino;Tetsuro Takahashi;Seishi Okamoto

  • Affiliations:
  • Fujitsu R& Center Co., Ltd., Beijing, China;Fujitsu Laboratories, Ltd., Kanagawa, Japan;Fujitsu Laboratories, Ltd., Kanagawa, Japan;Fujitsu Laboratories, Ltd., Kanagawa, Japan

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although RSS demonstrates a promising solution to track and personalize the flow of new Web information, many of the current Web sites are not yet enabled with RSS feeds. The availability of convenient approaches to “RSSify” existing suitable Web contents has become a stringent necessity. This paper presents a system that translates semi-structured HTML pages to structured RSS feeds, which proposes different approaches based on various features of HTML pages. For the information items with release time, the system provides an automatic approach based on time pattern discovery. Another automatic approach based on repeated tag pattern mining is applied to convert the regular pages without the time pattern. A semi-automatic approach based on labelling is available to process the irregular pages or specific sections in Web pages according to the user’s requirements. Experimental results and practical applications prove that our system is efficient and effective in facilitating the RSS feed generation.