Non-english web search: an evaluation of indexing and searching the Greek web

  • Authors:
  • Efthimis N. Efthimiadis;Nicos Malevris;Apostolos Kousaridas;Alexandra Lepeniotou;Nikos Loutas

  • Affiliations:
  • Information School, University of Washington, Seattle, USA;Department of Informatics, Athens University of Economics and Business, Athens, Greece;Department of Informatics, Athens University of Economics and Business, Athens, Greece and Department of Informatics and Telecommunications, University of Athens, Athens, Greece;Department of Informatics, Athens University of Economics and Business, Athens, Greece and Technological Educational Institution, TEI Larisa, Greece;Department of Informatics, Athens University of Economics and Business, Athens, Greece and Information Systems Lab, University of Macedonia, Thessaloniki, Greece

  • Venue:
  • Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

The study reports on a longitudinal and comparative evaluation of Greek language searching on the web. Ten engines, five global (A9, AltaVista, Google, MSN Search, and Yahoo!) and five Greek (Anazitisi, Ano-Kato, Phantis. Trinity, and Visto), were evaluated using (a) navigational queries in 2004 and 2006; and (b) by measuring the freshness of the search engine indices in 2005 and 2006. Homepage finding queries for known Greek organizations were created and searched. Queries included the name of the organization in its Greek and non-Greek, English or transliterated equivalent forms. The organizations represented ten categories: government departments, universities, colleges, travel agencies, museums, media (TV, radio, newspapers), transportation, and banks. The freshness of the indices was evaluated by examining the status of the returned URLs (live versus dead) from the navigational queries, and by identifying if the engines have indexed 32480 active (live) Greek domain URLs. Effectiveness measures included (a) qualitative assessment of how engines handle the Greek language; (b) precision at 10 documents (P@10); (c) mean reciprocal rank (MRR); (d) Navigational Query Discounted Cumulative Gain (NQ-DCG), a new heuristic evaluation measure; (e) response time; (f) the ratio of the dead URL links returned, (g) the presence or absence of URLs and the decay observed over the period of the study. The results report on which of the global and Greek search engines perform best; and if the performance achieved is good enough from a user's perspective.