Thai National Corpus: a progress report

  • Authors:
  • Wirote Aroonmanakun;Kachen Tansiri;Pairit Nittayanuparp

  • Affiliations:
  • Chulalongkorn University;Chulalongkorn University;Chulalongkorn University

  • Venue:
  • ALR7 Proceedings of the 7th Workshop on Asian Language Resources
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents problems and solutions in developing Thai National Corpus (TNC). TNC is designed to be a comparable corpus of British National Corpus. The project aims to collect eighty million words. Since 2006, the project can now collect only fourteen million words. The data is accessible from the TNC Web. Delay in creating the TNC is mainly caused from obtaining authorization of copyright texts. Methods used for collecting data and the results are discussed. Errors during the process of encoding data and how to handle these errors will be described.