Dictionary compression and information source correction

  • Authors:
  • Dénes Németh;Máté Lakat;Imre Szeberényi

  • Affiliations:
  • Budapest University of Technology, Budapest, Hungary;Budapest University of Technology, Budapest, Hungary;Budapest University of Technology, Budapest, Hungary

  • Venue:
  • LSSC'09 Proceedings of the 7th international conference on Large-Scale Scientific Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a method to compress, store, and search a dictionary of a natural language The dictionary can be represented as groups of words derived form a stem We describe how to represent and store a word group in a way that is compact and efficiently searchable The compression efficiency of the used algorithm highly depends on the quality of the information source The currently available tools and data sources contain several mistakes, which can be cleaned by the introduced method The paper also analyzes the efficiency of XML and two binary formats, and proposes two methods: directed acyclic graph transformation and word group regrouping that can be used to increase efficiency.