Investigating GIS and smoothing for maximum entropy taggers

Authors:
James R. Curran;Stephen Clark
Affiliations:
University of Edinburgh, Edinburgh;University of Edinburgh, Edinburgh
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Year:
2003

Citing 14
Cited 28

A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
The syntactic process

The syntactic process
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Chunking with maximum entropy models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Sequence modelling for sentence classification in a legal summarisation system

Proceedings of the 2005 ACM symposium on Applied computing
Deep syntactic processing by combining shallow methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Machine Learning
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Language independent NER using a maximum entropy tagger

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Blueprint for a high performance NLP infrastructure

SEALTS '03 Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems - Volume 8
Log-linear models for wide-coverage CCG parsing

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Parsing the WSJ using CCG and log-linear models

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Supersense tagging of unknown nouns using semantic similarity

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Multi-tagging for lexicalized-grammar parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Extractive summarisation of legal texts

Artificial Intelligence and Law - AI & law in eGovernment and eDemocracy part I
Linguistically motivated large-scale NLP with C&C and boxer

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Bootstrapping and evaluating named entity recognition in the biomedical domain

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Extremely lexicalized models for accurate and fast HPSG parsing

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Improving the efficiency of a wide-coverage CCG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
Bootstrapping and evaluating named entity recognition in the biomedical domain

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Accurate argumentative zoning with maximum entropy models

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Named entity recognition in Wikipedia

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Searching for ground truth: a stepping stone in automating genre classification

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Edinburgh-LTG: TempEval-2 system description

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Chart pruning for fast lexicalised-grammar parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Datasets for generic relation extraction*

Natural Language Engineering
Cross-lingual question answering using off-the-shelf machine translation

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
The challenges of parsing Chinese with combinatory categorial grammar

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A sequence labelling approach to quote attribution

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar.