Empirical study on usefulness of algorithm SACwRApper for reputation extraction from the WWW

  • Authors:
  • Hiroyuki Hasegawa;Mineichi Kudo;Atsuyoshi Nakamura

  • Affiliations:
  • Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan;Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan;Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan

  • Venue:
  • KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information [1,2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.