GETESS: Constructing a Linguistic Search Index for an Internet Search Engine
NLDB '00 Proceedings of the 5th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
In this paper, we illustrate how Internet documents can be automatically analyzed in order to identify the document's language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language-classification heuristics is to ensure that documents with the same content, but different languages (e.g., in German and English), will not simultaneously presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user.