Dictionaries, dictionary grammars and dictionary entry parsing

  • Authors:
  • Mary S. Neff;Branimir K. Boguraev

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, New York;IBM T. J. Watson Research Center, Yorktown Heights, New York

  • Venue:
  • ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

We identify two complementary processes in the conversion of machine-readable dictionaries into lexical databases: recovery of the dictionary stucture from the typographical markings which persist on the dictionary distribution tapes and embody the publishers' notational conventions; followed by making explicit all of the codified and ellided information packed into individual entries. We discuss notational conventions and tape formats, outline structural properties of dictionaries, observe a range of representational phenomena particularly relevant to dictionary parsing, and derive a set of minimal requirements for a dictionary grammar formalism. We present a general purpose dictionary entry parser which uses a formal notation designed to describe the structure of entries and performs a mapping from the flat character stream on the tape to a highly structured and fully instantiated representation of the dictionary. We demonstrate the power of the formalism by drawing examples from a range of dictionary sources which have been processed and converted into lexical databases.