Survey of the state of the art in human language technology
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Robust information extraction from automatically generated speech transcriptions
Speech Communication - Special issue on accessing information in spoken audio
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
A question answering system supported by information extraction
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A hybrid approach for named entity and sub-type tagging
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Teaching a weaker classifier: named entity recognition on upper case text
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Hi-index | 0.00 |
This paper describes a novel approach to namedentity (NE) tagging on degraded documents. NE taggingis the process of identifying salient text strings inunstructured text, corresponding to names of people,places, organizations, times/dates, etc. Although NEtagging is typically part of a larger informationextraction process, it has other applications, such asimproving search in an information retrieval system, andpost-processing the results of an OCR system. We focuson degraded documents, i.e. case insensitive documentsthat lack orthographic information. Examples includeoutput of speech recognition systems, as well as e-mail.The traditional approach involves retraining an NEtagger on degraded text, a cumbersome operation. Thispaper describes an approach whereby text is first"restored" to its implicit case sensitive form, andsubsequently processed by the original NE tagger.Results show that this new approach leads to far lessprecision loss in NE tagging of degraded documents.