A Heuristic Approach for Recognizing a Document's Language Used for the Internet Search Engine GETESS

  • Authors:
  • A. Düsterhöft;S. Gröticke

  • Affiliations:
  • -;-

  • Venue:
  • DEXA '00 Proceedings of the 11th International Workshop on Database and Expert Systems Applications
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we illustrate how Internet documents can be automatically analyzed in order to identify the document's language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language-classification heuristics is to ensure that documents with the same content, but different languages (e.g., in German and English), will not simultaneously presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user.