Period disambiguation with maxent model

Authors:
Chunyu Kit;Xiaoyue Liu
Affiliations:
Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong;Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 12
Cited 1

A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation

Computational Linguistics
Tagging sentence boundaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
MITRE: description of the Alembic system used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Some applications of tree-based modelling to speech and language

HLT '89 Proceedings of the workshop on Speech and Natural Language
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Abbreviation recognition with maxent model

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents our recent work on period disambiguation, the kernel problem in sentence boundary identification, with the maximum entropy (Maxent) model. A number of experiments are conducted on PTB-II WSJ corpus for the investigation of how context window, feature space and lexical information such as abbreviated and sentence-initial words affect the learning performance. Such lexical information can be automatically acquired from a training corpus by a learner. Our experimental results show that extending the feature space to integrate these two kinds of lexical information can eliminate 93.52% of the remaining errors from the baseline Maxent model, achieving an F-score of 99.8227%.