Record extraction based on user feedback and document selection

  • Authors:
  • Jianwei Zhang;Yoshiharu Ishikawa;Hiroyuki Kitagawa

  • Affiliations:
  • Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan;Information Technology Center, Nagoya University, Nagoya, Aichi, Japan;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan and Center for Computational Sciences, University of Tsukuba, ...

  • Venue:
  • APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, the research of record extraction from large document data is becoming popular. However there still exist some problems in record extraction. 1) when large document data is used for the target of information extraction, the process usually becomes very expensive. 2) it is also likely that extracted records may not pertain to the user's interest on the aspect of the topic. To address these problems, in this paper we propose a method to efficiently extract those records whose topics agree with the user's interest. To improve the efficiency of the information extraction system, our method identifies documents from which useful records are probably extracted. We make use of user feed-back on extraction results to find topic-related documents and records. Our experiments show that our system achieves high extraction accuracy across different extraction targets.