PRETO: a high-performance text mining tool for preprocessing Turkish texts

Authors:
Volkan Tunali;Turgay Tugay Bilgin
Affiliations:
Maltepe University;Maltepe University
Venue:
Proceedings of the 13th International Conference on Computer Systems and Technologies
Year:
2012

Citing 6
Cited 0

Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
A vector space model for automatic indexing

Communications of the ACM
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Information retrieval on Turkish texts

Journal of the American Society for Information Science and Technology
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections.