Improving natural language processing by linguistic document annotation

  • Authors:
  • Hideo Watanabe;Katashi Nagao;Michael C. McCord;Arendse Bernth

  • Affiliations:
  • IBM Research, Shimotsuruma, Yamato, Kanagawa, Japan;IBM Research, Shimotsuruma, Yamato, Kanagawa, Japan;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Natural language processing (NLP) programs are confronted with various difficulties in processing HTML and XML documents, and have the potential to produce better results if linguistic information is annotated in source texts. We have therefore developed the Linguistic Annotation Language (or LAL), which is an XML-compliant tag set for assisting natural language processing programs. It consists of linguistic information tags such as tags specifying word/phrasal boundaries, and task-dependent instruction tags such as tags defining the scope of translation for machine translation programs. We have also developed an LAL-annotation editor to facilitate users to annotate documents without seeing tags.