An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
TopCat: Data Mining for Topic Identification in a Text Corpus
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Everything old is new again: a fresh look at historical approaches in machine learning
Everything old is new again: a fresh look at historical approaches in machine learning
Why inverse document frequency?
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Detecting research topics via the correlation between graphs and texts
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A semi-supervised incremental algorithm to automatically formulate topical queries
Information Sciences: an International Journal
Part of Speech Based Term Weighting for Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Semantic-based estimation of term informativeness
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Static pruning of terms in inverted files
ECIR'07 Proceedings of the 29th European conference on IR research
Term-weighting for summarization of multi-party spoken dialogues
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Extracting mnemonic names of people from the web
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
A new measure for query disambiguation using term co-occurrences
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Topic-Oriented words as features for named entity recognition
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.01 |
Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is necessary for tasks such as named entity detection. How topic-centric, or informative, a word is can be valuable information. It is well known that informative words are best modeled by "heavy-tailed" distributions, such as mixture models. However, informativeness scores do not take full advantage of this fact. We introduce a new informativeness score that directly utilizes mixture model likelihood to identify informative words. We use the task of extracting restaurant names from bulletin board posts as a way to determine effectiveness. We find that our "mixture score" is weakly effective alone and highly effective when combined with Inverse Document Frequency. We compare against other informativeness criteria and find that only Residual IDF is competitive against our combined IDF/Mixture score.