Extraction of web texts using content-density distribution

  • Authors:
  • Saori Kitahara;Koya Tamura;Kenji Hatano

  • Affiliations:
  • Graduate School of Culture and Information Science, Doshisha University, Kyoto, Japan;UX Department, Mixi Inc., Tokyo, Japan;Faculty of Culture and Information Science, Doshisha University, Kyoto, Japan

  • Venue:
  • AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a method for grasping the content of each Web page and extracting a part of the Web page related to query keywords, in order to make more effective snippets of a Web search engine. We regard the content as a set of words in the text of a Web page, and we generate the content-density distribution by using both the position and the influence of the word. In our experiments, we found that the proposed method facilitated the recognition of the content of Web pages, as compared to conventional methods based on snippets.