Robin: extracting visual and textual features from web pages

Authors:
Mizuki Oka;Hiroshi Tsukada;Kazuhiko Kato
Affiliations:
University of Tsukuba;University of Tsukuba;University of Tsukuba
Venue:
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Year:
2006

Citing 1
Cited 0

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web pages contain information in several forms. These include textual information such as words and visual information such as images, use of color, and layout. We propose a method of extracting the characteristic features from both the textual and visual information in Web pages. Our method enables seamless integration of the two types of information and automatic extraction of their characteristic features. Based on this method, we developed a proof-of-concept system called Robin, which is designed to provide users with an intuitive way of browsing search engine results. The results of an experimental evaluation of the system showed that it has the potential to be practical and effective.