DynGenPar: a dynamic generalized parser for common mathematical language

  • Authors:
  • Kevin Kofler;Arnold Neumaier

  • Affiliations:
  • Faculty of Mathematics, University of Vienna, Austria, Wien, Austria;Faculty of Mathematics, University of Vienna, Austria, Wien, Austria

  • Venue:
  • CICM'12 Proceedings of the 11th international conference on Intelligent Computer Mathematics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a dynamic generalized parser aimed primarily at common natural mathematical language. Our algorithm combines the efficiency of GLR parsing, the dynamic extensibility of tableless approaches and the expressiveness of extended context-free grammars such as parallel multiple context-free grammars (PMCFGs). In particular, it supports efficient dynamic rule additions to the grammar at any moment. The algorithm is designed in a fully incremental way, allowing to resume parsing with additional tokens without restarting the parse process, and can predict possible next tokens. Additionally, we handle constraints on the token following a rule. This allows for grammatically correct English indefinite articles when working with word tokens. It can also represent typical operations for scannerless parsing such as maximal matches when working with character tokens. Our long-term goal is to computerize a large library of existing mathematical knowledge using the new parser, starting from natural language input as found in textbooks or in the papers collected by the digital mathematical library (DML) projects around the world. In this paper, we present the algorithmic ideas behind our approach, give a short overview of the implementation, and present some efficiency results. The new parser is available at http://www.tigen.org/kevin.kofler/fmathl/dyngenpar/ .