Text summarization of XML documents in croatian

  • Authors:
  • Nives Mikelic Preradovic;Tomislava Lauc;Damir Boras

  • Affiliations:
  • University of Zagreb, Faculty of Philosophy, Department of Information Sciences;University of Zagreb, Faculty of Philosophy, Department of Information Sciences;University of Zagreb, Faculty of Philosophy, Department of Information Sciences

  • Venue:
  • CEA'08 Proceedings of the 2nd WSEAS International Conference on Computer Engineering and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper describes automatic summarization of the XML documents in Croatian language. The goal of the summarizer is to generate extracts with high percent of extract-worthiness and similarity to the author's abstract. Our research shows that extracts generated using our algorithm are well formed, but it also shows that algorithm is very domain dependant. The research brought us to conclusion that we should develop the implementation of the Porter's stemming algorithm in order to improve the text summarization for Croatian language, which is currently at an early stage of development.