Thai National Corpus: a progress report

Authors:
Wirote Aroonmanakun;Kachen Tansiri;Pairit Nittayanuparp
Affiliations:
Chulalongkorn University;Chulalongkorn University;Chulalongkorn University
Venue:
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Year:
2009

Citing 1
Cited 0

Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents problems and solutions in developing Thai National Corpus (TNC). TNC is designed to be a comparable corpus of British National Corpus. The project aims to collect eighty million words. Since 2006, the project can now collect only fourteen million words. The data is accessible from the TNC Web. Delay in creating the TNC is mainly caused from obtaining authorization of copyright texts. Methods used for collecting data and the results are discussed. Errors during the process of encoding data and how to handle these errors will be described.