Information retrieval on Turkish texts

Authors:
Fazli Can;Seyit Kocberber;Erman Balcik;Cihan Kaynak;H. Cagdas Ocalan;Onur M. Vursavas
Affiliations:
Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Bilkent, Ankara 06800, Turkey;Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Bilkent, Ankara 06800, Turkey;Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Bilkent, Ankara 06800, Turkey;Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Bilkent, Ankara 06800, Turkey;Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Bilkent, Ankara 06800, Turkey;-
Venue:
Journal of the American Society for Information Science and Technology
Year:
2008

Citing 0
Cited 10

Bilkent news portal: a personalizable system with new event detection and tracking capabilities

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Spoken information retrieval for turkish broadcast news

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos

Knowledge-Based Systems
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language

Expert Systems with Applications: An International Journal
PRETO: a high-performance text mining tool for preprocessing Turkish texts

Proceedings of the 13th International Conference on Computer Systems and Technologies
A semi-automatic text-based semantic video annotation system for Turkish facilitating multilingual retrieval

Expert Systems with Applications: An International Journal
A hybrid approach for extracting informative content from web pages

Information Processing and Management: an International Journal
Language independent semantic kernels for short-text classification

Expert Systems with Applications: An International Journal
The impact of preprocessing on text classification

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing. © 2008 Wiley Periodicals, Inc.