Empirical study on usefulness of algorithm SACwRApper for reputation extraction from the WWW

Authors:
Hiroyuki Hasegawa;Mineichi Kudo;Atsuyoshi Nakamura
Affiliations:
Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan;Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan;Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan
Venue:
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV
Year:
2005

Citing 5
Cited 1

Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
Mining product reputations on the Web

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Specific-Purpose web searches on the basis of structure and contents

Proceedings of the 2005 international conference on Federation over the Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of extracting texts related to a given keyword from Web pages collected by a search engine. Recently, we proposed a method using both structural and content information [1,2]. In our previous paper, we reported good extraction performance of our method only for Ramen-shop dataset written in Japanese. In this paper, we examine it for datasets of other kind of restaurants, and also for a dataset written in English. We discuss some modification for performance improvement.