Storing and retrieving word phrases
Information Processing and Management: an International Journal
The design for the wall street journal-based CSR corpus
HLT '91 Proceedings of the workshop on Speech and Natural Language
Maximum Likelihood Set for Estimating a Probability Mass Function
Neural Computation
A nonparametric method for extraction of candidate phrasal terms
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A comparison of document, sentence, and term event spaces
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Reduced n-gram models for English and Chinese corpora
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Characteristics of character usage in Chinese Web searching
Information Processing and Management: an International Journal
A signal-to-noise approach to score normalization
Proceedings of the 18th ACM conference on Information and knowledge management
Active learning for multilingual statistical machine translation
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A corpus of Australian contract language: description, profiling and analysis
Proceedings of the 13th International Conference on Artificial Intelligence and Law
A statistical test for grammar
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Zipf's law and mandelbrot's constants for turkish language using turkish corpus (turco)
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Pattern mining across domain-specific text collections
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning to extract chemical names based on random text generation and incomplete dictionary
Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Chemical Name Extraction Based on Automatic Training Data Generation and Rich Feature Set
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words. However, when single word and n-gram phrases are combined together in one list and put in order of frequency the combined list follows Zipf's law accurately for all words and phrases, down to the lowest frequencies in both languages. The Zipf curves for the two languages are then almost identical.