Named entity discovery using comparable news articles

Authors:
Yusuke Shinyama;Satoshi Sekine
Affiliations:
New York University, New York, NY;New York University, New York, NY
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 4
Cited 13

A self-learning universal concept spotter

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Unsupervised learning of generalized names

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Paraphrase acquisition for information extraction

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16

Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically Detecting Members and Instrumentation of Music Bands Via Web Content Mining

Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Low-Cost Supervision for Multiple-Source Attribute Extraction

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Clique-based clustering for improving named entity recognition systems

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Mining named entities with temporally correlated bursts from multilingual web news streams

Proceedings of the fourth ACM international conference on Web search and data mining
Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

ACM Transactions on Asian Language Information Processing (TALIP)
Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Research on Language and Computation
Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Journal of Information Science
A survey of methods to ease the development of highly multilingual text mining applications

Language Resources and Evaluation
Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a way to discover Named Entities by using the distribution of words in news articles. Named Entity recognition is an important task for today's natural language applications, but it still suffers from data sparseness. We used an observation that a Named Entity is likely to appear synchronously in several news articles, whereas a common noun is less likely. Exploiting this characteristic, we successfully obtained rare Named Entities with 90% accuracy just by comparing time series distributions of a word in two newspapers. Although the achieved recall is not sufficient yet, we believe that this method can be used to strengthen the lexical knowledge of a Named Entity tagger.