First large-scale information retrieval experiments on turkish texts

Authors:
Fazli Can;Seyit Kocberber;Erman Balcik;Cihan Kaynak;H. Cagdas Ocalan;Onur M. Vursavas
Affiliations:
Bilkent University, Bilkent, Turkey;Bilkent University, Bilkent, Turkey;Bilkent University, Bilkent, Turkey;Bilkent University, Bilkent, Turkey;Bilkent University, Bilkent, Turkey;Bilkent University, Bilkent, Turkey
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 3
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Large-scale cluster-based retrieval experiments on Turkish texts

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Turkish information retrieval: past changes future

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching functions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.