PRETO: a high-performance text mining tool for preprocessing Turkish texts

  • Authors:
  • Volkan Tunali;Turgay Tugay Bilgin

  • Affiliations:
  • Maltepe University;Maltepe University

  • Venue:
  • Proceedings of the 13th International Conference on Computer Systems and Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections.