A framework for integrating natural language tools

Authors:
Joäo Graça;Nuno J. Mamede;Joäo D. Pereira
Affiliations:
Spoken Language Systems Lab, L2F – INESC-ID Lisboa/IST, Lisboa, Portugal;Spoken Language Systems Lab, L2F – INESC-ID Lisboa/IST, Lisboa, Portugal;Spoken Language Systems Lab, Software Eng. Group – INESC-ID Lisboa/IST, Lisboa, Portugal
Venue:
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Year:
2006

Citing 3
Cited 0

Patterns of Enterprise Application Architecture

Patterns of Enterprise Application Architecture
Evolving GATE to meet new challenges in language engineering

Natural Language Engineering
Emdros: a text database engine for analyzed or annotated text

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Natural Language processing (NLP) systems are typically characterized by a pipeline architecture in which several independently developed NLP tools, connected as a chain of filters, apply successive transformations to the data that flows through the system. Hence when integrating such tools, one may face problems that lead to information losses, such as: (i) tools discard information from their input which will be required by other tools further along the pipeline; (ii) each tool has its own input/output format. This work proposes a solution that solves these problems. We offer a framework for NLP systems. The systems built using this framework use a client server architecture, in which the server acts as a blackboard where all tools add/consult data. Data is kept in the server under a conceptual model independent of the client tools, thus allowing the representation of a broad range of linguistic information. The tools interact with the server through a generic API which allows the creation of new data and the navigation through all the existing data. Moreover, we provide libraries implemented in several programming language that abstract the connection and communication protocol details between the tools and the server, and provide several levels of functionality that simplify server use.