Polish language processing chains for multilingual information systems

  • Authors:
  • Maciej Ogrodniczuk;Adam Przepiórkowski

  • Affiliations:
  • Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland;Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland

  • Venue:
  • NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ATLAS project, started in March 2010, intends to create a multilingual language processing framework integrating the common set of linguistic tools for a group of European languages, among them Polish. The chained tools producing multi-level UIMA-encoded annotation of texts can be used by NLP applications for complex language-intensive operations such as automated categorization, information extraction, machine translation or summarization. This paper concentrates on applications of ATLAS language processing chains to multilingual information systems, with particular interest in processing Polish. Inflectional characteristics of this language offers the possibility to comment on a few more advanced functions such as multiword unit lemmatisation, vital for real-life presentation of extracted phrases. Several sample applications using the NLP chain are also presented.