A Heuristic Approach for Recognizing a Document's Language Used for the Internet Search Engine GETESS

Authors:
A. Düsterhöft;S. Gröticke
Affiliations:
-;-
Venue:
DEXA '00 Proceedings of the 11th International Workshop on Database and Expert Systems Applications
Year:
2000

Citing 0
Cited 2

GETESS: Constructing a Linguistic Search Index for an Internet Search Engine

NLDB '00 Proceedings of the 5th International Conference on Applications of Natural Language to Information Systems-Revised Papers
MEMPHIS: a mobile agent-based system for enabling acquisition of multilingual content and providing flexible format internet premium services

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we illustrate how Internet documents can be automatically analyzed in order to identify the document's language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language-classification heuristics is to ensure that documents with the same content, but different languages (e.g., in German and English), will not simultaneously presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user.