Finding Thai Web Pages in Foreign Web Spaces

  • Authors:
  • Kulwadee Somboonviwat;Takayuki Tamura;Masaru Kitsuregawa

  • Affiliations:
  • The University of Tokyo, Japan;Mitsubishi Electric Corporation;The University of Tokyo, Japan

  • Venue:
  • ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes language specific web crawling (LSWC) as a method of creating large-scale language specific Web archives for countries with linguistic identities such as Thailand. The LSWC strategy for selectively gathering Thai web pages from virtually anywhere on the Web is derived based on the results of static analyses of the Thai Web graph. We evaluated the performance of the LSWC strategy using a web crawling simulator.