A statistical approach to machine translation
Computational Linguistics
An estimate of an upper bound for the entropy of English
Computational Linguistics
Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
Building probabilistic models for natural language
Building probabilistic models for natural language
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A spelling correction program based on a noisy channel model
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Learning to resolve natural language ambiguities: a unified approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Learning a monolingual language model from a multilingual text database
Proceedings of the ninth international conference on Information and knowledge management
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Better Contextual Translation Using Machine Learning
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Memory-based learning: using similarity for smoothing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On modeling profiles instead of values
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Word prediction using a clustered optimal binary search tree
Information Processing Letters
Language identification in web pages
Proceedings of the 2005 ACM symposium on Applied computing
Probabilistic Finite-State Machines-Part II
IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicalization of probabilistic grammars
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Improving subcategorization acquisition using word sense disambiguation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A study of the dirichlet priors for term frequency normalisation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Iterative translation disambiguation for cross-language information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Comparison between tagged corpora for the named entity task
WCC '00 Proceedings of the workshop on Comparing corpora - Volume 9
Maximum Likelihood Set for Estimating a Probability Mass Function
Neural Computation
Semantically motivated subcategorization acquisition
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Transformational priors over grammars
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A very very large corpus doesn't always yield reliable estimates
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A language model approach to keyphrase extraction
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Extracting key-substring-group features for text classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Text classification improved through multigram models
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A lower bound on compression of unknown alphabets
Theoretical Computer Science
Improving IBM word-alignment model 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Self-organizing η-gram model for automatic word spacing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Reranking answers for definitional QA using language modeling
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An information-theoretic approach to automatic evaluation of summaries
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A retrospective study of a hybrid document-context based retrieval model
Information Processing and Management: an International Journal
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
The Journal of Machine Learning Research
Using bilingual comparable corpora and semi-supervised clustering for topic tracking
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Inducing word alignments with bilexical synchronous trees
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Three new graphical models for statistical language modelling
Proceedings of the 24th international conference on Machine learning
Searching with style: authorship attribution in classic literature
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Time-dependent event hierarchy construction
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Relevance models for topic detection and tracking
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Topic tracking based on bilingual comparable corpora and semisupervised clustering
ACM Transactions on Asian Language Information Processing (TALIP)
Automatic scoring of short handwritten essays in reading comprehension tests
Artificial Intelligence
Entropy of search logs: how hard is search? with personalization? with backoff?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Similarity based smoothing in language modeling
Acta Cybernetica
A simple and efficient sampling method for estimating AP and NDCG
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
EDA: AN EVOLUTIONARY DECODING ALGORITHM FOR STATISTICAL MACHINE TRANSLATION
Applied Artificial Intelligence
Estimating average precision when judgments are incomplete
Knowledge and Information Systems
Word segmentation for the Myanmar language
Journal of Information Science
Discrete data clustering using finite mixture models
Pattern Recognition
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Improving Automatic Image Annotation Based on Word Co-occurrence
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
A Hybrid Approach to Word Segmentation of Vietnamese Texts
Language and Automata Theory and Applications
Classifying Digital Resources in a Practical and Coherent Way with Easy-to-Get Features
PAKM '08 Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management
Discovering users' specific geo intention in web search
Proceedings of the 18th international conference on World wide web
Probabilistic Classifications with TBL
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Modeling actions of PubMed users with n-gram language models
Information Retrieval
Smoothing a tera-word language model
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Large-Scale Statistical Machine Translation with Weighted Finite State Transducers
Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Recognizing names in biomedical texts using hidden Markov model and SVM plus sigmoid
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
System scoring using partial prior information
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
On a Kernel Regression Approach to Machine Translation
IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Proceedings of the International Workshop on Multilingual OCR
Context-based Arabic morphological analysis for machine translation
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
High-level goal recognition in a wireless LAN
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
A look at parsing and its applications
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
A local alignment kernel in the context of NLP
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Comparison between tagged corpora for the named entity task
CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Style & topic language model adaptation using HMM-LDA
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Time-Sensitive Language Modelling for Online Term Recurrence Prediction
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Computational linkuistics: word triggers across hyperlinks
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Tied-mixture language modeling in continuous space
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
KU: word sense disambiguation by substitution
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Sequence prediction exploiting similarity information
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Scaling high-order character language models to gigabytes
Software '05 Proceedings of the Workshop on Software
MaTrEx: the DCU MT system for WMT 2008
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
MaTrEx: the DCU MT system for WMT 2009
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
LIMSI's statistical translation systems for WMT'09
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Understanding tag-cloud and visual features for better annotation of concepts in NUS-WIDE dataset
WSMC '09 Proceedings of the 1st workshop on Web-scale multimedia corpus
Cache-based language model adaptation using visual attention for ASR in meeting scenarios
Proceedings of the 2009 international conference on Multimodal interfaces
Learning mixture models via component-wise parameter smoothing
Computational Statistics & Data Analysis
Optimizing word alignment combination for phrase table training
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Distributional representations for handling sparsity in supervised sequence-labeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A joint language model with fine-grain syntactic tags
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Language models for contextual error detection and correction
CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Word prediction using a clustered optimal binary search tree
Information Processing Letters
Entropy-based authorship search in large document collections
ECIR'07 Proceedings of the 29th European conference on IR research
Improving naive Bayes text classifier using smoothing methods
ECIR'07 Proceedings of the 29th European conference on IR research
Concept models for domain-specific search
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Conceptual language models for domain-specific retrieval
Information Processing and Management: an International Journal
Recognition driven page orientation detection
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Rewriting the orthography of sms messages
Natural Language Engineering
Improved extraction assessment through better language models
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online learning for interactive statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
MaTrEx: the DCU MT system for WMT 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Better punctuation prediction with dynamic conditional random fields
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Top-down nearly-context-sensitive parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Training continuous space language models: some practical issues
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Measuring the interestingness of articles in a limited user environment
Information Processing and Management: an International Journal
Web image concept annotation with better understanding of tags and visual features
Journal of Visual Communication and Image Representation
Local lexical adaptation in machine translation through triangulation: SMT helping SMT
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Learning to predict readability using diverse linguistic features
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Evaluating multiple viewpoint models of tabla sequences
Proceedings of 3rd international workshop on Machine learning and music
Semantic and phonetic automatic reconstruction of medical dictations
Computer Speech and Language
Enhanced suffix arrays as language models: virtual k-testable languages
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Multi-modal computer assisted speech transcription
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Using local alignments for relation recognition
Journal of Artificial Intelligence Research
Directional distributional similarity for lexical inference
Natural Language Engineering
Head-modifier relation based non-lexical reordering model for phrase-based translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Filtering artificial texts with statistical machine learning techniques
Language Resources and Evaluation
A logistic regression-based smoothing method for Chinese text categorization
Expert Systems with Applications: An International Journal
Finding deceptive opinion spam by any stretch of the imagination
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Algorithm selection and model adaptation for ESL correction tasks
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Crypt analysis of two time pads in case of compressed speech
Computers and Electrical Engineering
Detecting outlier sections in us congressional legislation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Smoothing techniques for adaptive online language models: topic tracking in tweet streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Explicit length modelling for statistical machine translation
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Smoothing multinomial naïve bayes in the presence of imbalance
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A spectral learning algorithm for finite state transducers
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
A survey of probabilistic methods of morphological tagging
Automatic Documentation and Mathematical Linguistics
Bilingual random walk models for automated grammar correction of ESL author-produced text
IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
The Web as a Source of Evidence for Filtering Candidate Answers to Natural Language Questions
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Using temporal data for making recommendations
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Incorporating external information in bayesian classifiers via linear feature transformations
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Automatic chinese text classification using n-gram model
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Different approaches to bilingual text classification based on grammatical inference techniques
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Lexicalized beam thresholding parsing with prior and boundary estimates
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Bandwidth-aware reconfigurable cache design with hybrid memory technologies
Proceedings of the International Conference on Computer-Aided Design
The CMU-ARK German-English translation system
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Noisy SMS machine translation in low-density languages
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
LSM: language sense model for information retrieval
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Syntactic decision tree LMs: random selection or intelligent design?
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A chunking strategy towards unknown word detection in chinese word segmentation
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Language modelling with dynamic syntax
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
The latent words language model
Computer Speech and Language
A monotonic statistical machine translation approach to speaking style transformation
Computer Speech and Language
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Explicit length modelling for statistical machine translation
Pattern Recognition
The word-gesture keyboard: reimagining keyboard interaction
Communications of the ACM
A utility-theoretic ranking method for semi-automated text classification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Human activity recognition with trajectory data in multi-floor indoor environment
RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Continuous space translation models with neural networks
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
An empirical evaluation of stop word removal in statistical machine translation
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
ICMI'12 grand challenge: haptic voice recognition
Proceedings of the 14th ACM international conference on Multimodal interaction
Probabilistic integration of partial lexical information for noise robust haptic voice recognition
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large-scale syntactic language modeling with treelets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Topic models for dynamic translation model adaptation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Measuring the influence of long range dependencies with neural network language models
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Type-supervised hidden Markov models for part-of-speech tagging with incomplete tag dictionaries
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Thesaurus-based feedback to support mixed search and browsing environments
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
The CMU-avenue French-English translation system
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Optimization strategies for online large-margin learning in machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Segmenting web-domains and hashtags using length specific models
Proceedings of the 21st ACM international conference on Information and knowledge management
A picture paints a thousand words: a method of generating image-text timelines
Proceedings of the 21st ACM international conference on Information and knowledge management
Contextual Language Models For Ranking Answers To Natural Language Definition Questions
Computational Intelligence
Detecting Trends in Social Bookmarking Systems: A del.icio.us Endeavor
International Journal of Data Warehousing and Mining
Class-Based language models for chinese-english parallel corpus
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
ContextType: using hand posture information to improve mobile touch screen text entry
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Sentiment diversification with different biases
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Leveraging relevance cues for language modeling in speech recognition
Information Processing and Management: an International Journal
Mining source code repositories at massive scale using language modeling
Proceedings of the 10th Working Conference on Mining Software Repositories
Connecting users across social media sites: a behavioral-modeling approach
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Human sensing for smart cities
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Using naive bayes to detect spammy names in social networks
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Statistical machine translation enhancements through linguistic levels: A survey
ACM Computing Surveys (CSUR)
Using part of speech n-grams for improving automatic speech recognition of polish
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Detecting hidden enemy lines in IP address space
Proceedings of the 2013 workshop on New security paradigms workshop
Proceedings of the 19th international conference on Intelligent User Interfaces
Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.02 |
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.