Sample-based collection and adjustment algorithm for metadata extraction parameter of flexible format document

  • Authors:
  • Toshiko Matsumoto;Mitsuharu Oba;Takashi Onoyama

  • Affiliations:
  • Research and Development Division, Hitachi Software Engineering Co., Ltd., Tokyo, Japan;Research and Development Division, Hitachi Software Engineering Co., Ltd., Tokyo, Japan;Research and Development Division, Hitachi Software Engineering Co., Ltd., Tokyo, Japan

  • Venue:
  • ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an algorithm for automatically generating metadata extraction parameters. It first enumerates candidates on the basis of metadata occurrence in training documents, and then examines these candidates to avoid side effects and to maximize effectiveness. This two-stage approach enables both avoidance of exponential explosion of computation and detailed optimization. An experiment on Japanese business documents shows that an automatically generated parameter enables metadata extraction as accurately as a manually adjusted one.