An algorithm for suffix stripping
Readings in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
An Approach to Designing Very Fast Approximate String Matching Algorithms
IEEE Transactions on Knowledge and Data Engineering
Engineering and utilizing a stopword list in Greek Web retrieval
Journal of the American Society for Information Science and Technology
Lemmatization and stopword elimination in Greek web searching
EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
EuroGOV: engineering a multilingual web corpus
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Hi-index | 0.00 |
Greek is one of the most difficult languages to handle in Web Information Retrieval (IR) related tasks. Its difficulty stems from the fact that it is grammatically, morphologically and orthographically more complex than the lingua franca of IR, English. In this paper, we address a significant number of issues that originate from the Greek language. We use a number of techniques to determine the correct encoding that is used by web pages written in Greek. We test the effect of using a Greek stopword list in a realistic and controlled Web environment. We employ a character mapping scheme, in order to overcome the problem of the diversity of diacritics used in the language, such as accents and diaeresis. We utilize word distance and fuzzy similarity metrics in order to make up for the different forms that nouns, verbs and articles appear because of conjugations and inflections and additionally handle greeklish queries, a transliterated form of Greek. The conducted experiments present some effective ways to increase the accuracy in Greek IR tasks.