ProcessTron: efficient semi-automated markup generation for scientific documents

  • Authors:
  • Guido Sautter;Klemens Böhm;Conny Kühne;Tobias Mathäß

  • Affiliations:
  • KIT, Karlsruhe, Germany;KIT, Karlsruhe, Germany;KIT, Karlsruhe, Germany;KIT, Karlsruhe, Germany

  • Venue:
  • Proceedings of the 10th annual joint conference on Digital libraries
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digitizing legacy documents and marking them up with XML is important for many scientific domains. However, creating comprehensive semantic markup of high quality is challenging. Respective processes consist of many steps, with automated markup generation and intermediate manual correction. These corrections are extremely laborious. To reduce this effort, this paper makes two contributions: First, it proposes ProcessTron, a lightweight markup-process-control mechanism. ProcessTron assists users in two ways: It ensures that the steps are executed in the appropriate order, and it points the user to possible errors during manual correction. Second, ProcessTron has been deployed in real-world projects, and this paper reports on our experiences. A core observation is that ProcessTron more than halves the time users need to mark up a document. Results from laboratory experiments, which we have conducted as well, confirm this finding.