Foundations of statistical natural language processing
Foundations of statistical natural language processing
Language identification in unknown signals
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Decomposing background topics from keywords by principal component pursuit
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Distributions of functional and content words differ radically
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Local buffer as source of web mining data
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Hi-index | 0.00 |
We observed that the coefficients of two important empirical statistical laws of language - Zipf law and Heaps law - are different for different languages, as we illustrate on English and Russian examples. This may have both theoretical and practical implications. On the one hand, the reasons for this may shed light on the nature of language. On the other hand, these two laws are important in, say, full-text database design allowing predicting the index size.