SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Identifying unknown proper names in newswire text
Corpus processing for lexical acquisition
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
Automatic rule induction for unknown-word guessing
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Feature lattices for maximum entropy modelling
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Document centered approach to text normalization
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Integrated multi-strategic Web document pre-processing for sentence and word boundary detection
Information Processing and Management: an International Journal
Periods, capitalized words, etc.
Computational Linguistics
Formal Methods of Tokenization for Part-of-Speech Tagging
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A hybrid approach for named entity and sub-type tagging
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A geo-coding service encompassing a geo-parsing tool and integrated digital gazetteer service
HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Capitalizing machine translation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Implementation of Croatian NERC system
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
A metadata geoparsing system for place name recognition and resolution in metadata records
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A language independent approach for named entity recognition in subject headings
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Passage retrieval for incorporating global evidence in sequence labeling
Proceedings of the 20th ACM international conference on Information and knowledge management
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
A case study of using web search statistics: case restoration
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
In this paper we present an approach to the disambiguation of capitalized words when they are used in the positions where capitalization is expected, such as the first word in a sentence or after a period, quotes, etc.. Such words can act as proper names or can be just capitalized variants of common words. The main feature of our approach is that it uses a minimum of prebuilt resources and tires to dynamically infer the disambiguation clues from the entire document. The approach was thoroughly tested and achieved about 98.5% accuracy on unseen texts from The New York Times 1996 corpus.