Using term informativeness for named entity detection

Authors:
Jason D. M. Rennie;Tommi Jaakkola
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 6
Cited 10

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Information Retrieval

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
TopCat: Data Mining for Topic Identification in a Text Corpus

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Everything old is new again: a fresh look at historical approaches in machine learning

Everything old is new again: a fresh look at historical approaches in machine learning
Why inverse document frequency?

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies

Detecting research topics via the correlation between graphs and texts

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A semi-supervised incremental algorithm to automatically formulate topical queries

Information Sciences: an International Journal
Part of Speech Based Term Weighting for Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Semantic-based estimation of term informativeness

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Static pruning of terms in inverted files

ECIR'07 Proceedings of the 29th European conference on IR research
Term-weighting for summarization of multi-party spoken dialogues

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Think globally, apply locally: using distributional characteristics for Hindi named entity identification

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Extracting mnemonic names of people from the web

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
A new measure for query disambiguation using term co-occurrences

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Topic-Oriented words as features for named entity recognition

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is necessary for tasks such as named entity detection. How topic-centric, or informative, a word is can be valuable information. It is well known that informative words are best modeled by "heavy-tailed" distributions, such as mixture models. However, informativeness scores do not take full advantage of this fact. We introduce a new informativeness score that directly utilizes mixture model likelihood to identify informative words. We use the task of extracting restaurant names from bulletin board posts as a way to determine effectiveness. We find that our "mixture score" is weakly effective alone and highly effective when combined with Inverse Document Frequency. We compare against other informativeness criteria and find that only Residual IDF is competitive against our combined IDF/Mixture score.