Scaling high-order character language models to gigabytes

Authors:
Bob Carpenter
Affiliations:
Alias-i, Inc., Brooklyn, NY
Venue:
Software '05 Proceedings of the Workshop on Software
Year:
2005

Citing 14
Cited 12

The craft of Prolog

The craft of Prolog
Elements of information theory

Elements of information theory
A study of n-gram and decision tree letter language modeling methods

Speech Communication
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp

Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
On the Estimation of 'Small' Probabilities by Leaving-One-Out

IEEE Transactions on Pattern Analysis and Machine Intelligence
The entropy of English using PPM-based models

DCC '96 Proceedings of the Conference on Data Compression
Two-level, many-paths generation

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Handling sparse data by successive abstraction

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
tRuEcasIng

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Named entity recognition with character-level models

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

A natural language approach to automated cryptanalysis of two-time pads

Proceedings of the 13th ACM conference on Computer and communications security
Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

Expert Systems with Applications: An International Journal
Sentiment classification of online Cantonese reviews by supervised machine learning approaches

International Journal of Web Engineering and Technology
Scanning methods and language modeling for binary switch typing

SLPAT '10 Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies
Sentiment analysis of customer reviews: balanced versus unbalanced datasets

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Asynchronous fixed-grid scanning with dynamic codes

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Learning-based time-sensitive re-ranking for web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Sentiment Analysis of Turkish Political News

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Beyond myopic inference in big data pipelines

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Improved inference and autotyping in EEG-based BCI typing systems

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Event identification in web social media through named entity recognition and topic modeling

Data & Knowledge Engineering
Twitter n-gram corpus with demographic metadata

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the implementation steps required to scale high-order character language models to gigabytes of training data without pruning. Our online models build character-level PAT trie structures on the fly using heavily data-unfolded implementations of an mutable daughter maps with a long integer count interface. Terminal nodes are shared. Character 8-gram training runs at 200,000 characters per second and allows online tuning of hyperparameters. Our compiled models precompute all probability estimates for observed n-grams and all interpolation parameters, along with suffix pointers to speedup context computations from proportional to n-gram length to a constant. The result is compiled models that are larger than the training models, but execute at 2 million characters per second on a desktop PC. Cross-entropy on held-out data shows these models to be state of the art in terms of performance.