Rich document representation and classification: An analysis
Knowledge-Based Systems
Automatic identification of confusable drug names
Artificial Intelligence in Medicine
A comparison of language identification approaches on short, query-style texts
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
I2DEE: an integrated and interactive data exploration environment used for ontology design
EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Multiway-tree retrieval based on treegrams
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
Relaxation labelling - the principle of 'least disturbance'
Pattern Recognition Letters
Similarity measures for sequential data
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Non-syntactic word prediction for AAC
SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
Model matching for Web Services on context dependencies
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
An efficient distance metric for linear genetic programming
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Estimating domain-based user influence in social networks
Proceedings of the 28th Annual ACM Symposium on Applied Computing
A close look on n-grams in intrusion detection: anomaly detection vs. classification
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Hi-index | 0.14 |
n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar properties were also derived from the most frequent 1000 words of three other corpuses. The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared.