The craft of Prolog
Elements of information theory
Elements of information theory
A study of n-gram and decision tree letter language modeling methods
Speech Communication
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
On the Estimation of 'Small' Probabilities by Leaving-One-Out
IEEE Transactions on Pattern Analysis and Machine Intelligence
The entropy of English using PPM-based models
DCC '96 Proceedings of the Conference on Data Compression
Two-level, many-paths generation
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Handling sparse data by successive abstraction
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Named entity recognition with character-level models
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A natural language approach to automated cryptanalysis of two-time pads
Proceedings of the 13th ACM conference on Computer and communications security
Expert Systems with Applications: An International Journal
Sentiment classification of online Cantonese reviews by supervised machine learning approaches
International Journal of Web Engineering and Technology
Scanning methods and language modeling for binary switch typing
SLPAT '10 Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies
Sentiment analysis of customer reviews: balanced versus unbalanced datasets
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Asynchronous fixed-grid scanning with dynamic codes
SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Learning-based time-sensitive re-ranking for web search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Sentiment Analysis of Turkish Political News
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Beyond myopic inference in big data pipelines
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Improved inference and autotyping in EEG-based BCI typing systems
Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Event identification in web social media through named entity recognition and topic modeling
Data & Knowledge Engineering
Twitter n-gram corpus with demographic metadata
Language Resources and Evaluation
Hi-index | 0.00 |
We describe the implementation steps required to scale high-order character language models to gigabytes of training data without pruning. Our online models build character-level PAT trie structures on the fly using heavily data-unfolded implementations of an mutable daughter maps with a long integer count interface. Terminal nodes are shared. Character 8-gram training runs at 200,000 characters per second and allows online tuning of hyperparameters. Our compiled models precompute all probability estimates for observed n-grams and all interpolation parameters, along with suffix pointers to speedup context computations from proportional to n-gram length to a constant. The result is compiled models that are larger than the training models, but execute at 2 million characters per second on a desktop PC. Cross-entropy on held-out data shows these models to be state of the art in terms of performance.