Zipf's law and mandelbrot's constants for turkish language using turkish corpus (turco)
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Hi-index | 0.00 |
Determination of the statistical properties of a naturallanguage is one of the most important part of thelanguage analysis. Number of Different Words (NODW),and Different Word Usage Ratio (DWUR) concepts aresome of the general characteristics of a corpus. Thesevalues are described and calculated for the TurkishCorpus (TurCo). Also, word n-grams are calculated forTurkish which was done for English years ago butcouldn't be done for Turkish because of the lack of alarge scale corpus. Obtained results from n-grams werecompared with the results of the Brown corpus (veryknown corpus for English) and similarity between TurCoand Brown corpus was examined.