A knowledge-free method for capitalized word disambiguation

Authors:
Andrei Mikheev
Affiliations:
Harlequin Ltd., Edinburgh, UK
Venue:
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Year:
1999

Citing 9
Cited 17

One term or two?

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Identifying unknown proper names in newswire text

Corpus processing for lexical acquisition
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation

Computational Linguistics
Automatic rule induction for unknown-word guessing

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Feature lattices for maximum entropy modelling

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2

Document centered approach to text normalization

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Integrated multi-strategic Web document pre-processing for sentence and word boundary detection

Information Processing and Management: an International Journal
Periods, capitalized words, etc.

Computational Linguistics
Formal Methods of Tokenization for Part-of-Speech Tagging

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A hybrid approach for named entity and sub-type tagging

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging sentence boundaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
tRuEcasIng

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A geo-coding service encompassing a geo-parsing tool and integrated digital gazetteer service

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Capitalizing machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

Speech Communication
Implementation of Croatian NERC system

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
A metadata geoparsing system for place name recognition and resolution in metadata records

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A language independent approach for named entity recognition in subject headings

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Passage retrieval for incorporating global evidence in sequence labeling

Proceedings of the 20th ACM international conference on Information and knowledge management
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
A case study of using web search statistics: case restoration

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an approach to the disambiguation of capitalized words when they are used in the positions where capitalization is expected, such as the first word in a sentence or after a period, quotes, etc.. Such words can act as proper names or can be just capitalized variants of common words. The main feature of our approach is that it uses a minimum of prebuilt resources and tires to dynamically infer the disambiguation clues from the entire document. The approach was thoroughly tested and achieved about 98.5% accuracy on unseen texts from The New York Times 1996 corpus.