Distributed and Parallel Databases
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Named Entity recognition without gazetteers
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Workflow Management: Models, Methods, and Systems
Workflow Management: Models, Methods, and Systems
Automated Defect Prevention: Best Practices in Software Management
Automated Defect Prevention: Best Practices in Software Management
Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
Digitizing legacy documents and marking them up with XML is important for many scientific domains. However, creating comprehensive semantic markup of high quality is challenging. Respective processes consist of many steps, with automated markup generation and intermediate manual correction. These corrections are extremely laborious. To reduce this effort, this paper makes two contributions: First, it proposes ProcessTron, a lightweight markup-process-control mechanism. ProcessTron assists users in two ways: It ensures that the steps are executed in the appropriate order, and it points the user to possible errors during manual correction. Second, ProcessTron has been deployed in real-world projects, and this paper reports on our experiences. A core observation is that ProcessTron more than halves the time users need to mark up a document. Results from laboratory experiments, which we have conducted as well, confirm this finding.