Locating complex named entities in web text

Authors:
Doug Downey;Matthew Broadhead;Oren Etzioni
Affiliations:
Turing Center, Department of Computer Science and Engineering, University of Washington, Seattle, WA;Turing Center, Department of Computer Science and Engineering, University of Washington, Seattle, WA;Turing Center, Department of Computer Science and Engineering, University of Washington, Seattle, WA
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 11
Cited 45

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Semi-supervised sequence modeling with syntactic topic models

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2

On filtering irrelevant results in peer-to-peer search

Proceedings of the 2008 ACM symposium on Applied computing
Web opinion mining: how to extract opinions from blogs?

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
Unsupervised Web-based Automatic Annotation

Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Turning web text and search queries into factual knowledge: hierarchical class attribute extraction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Semi-automatic entity set refinement

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Helping editors choose better seed sets for entity set expansion

Proceedings of the 18th ACM conference on Information and knowledge management
Semantically Expanding Questions for Supervised Automatic Classification

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Injecting Structured Data to Generative Topic Model in Enterprise Settings

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Creating a dead poets society: extracting a social network of historical persons from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
From web directories to ontologies: natural language processing challenges

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Open-domain semantic role labeling by modeling word spans

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Distributional similarity vs. PU learning for entity set expansion

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Annotating large email datasets for named entity recognition with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Machine reading at the University of Washington

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Measuring the non-compositionality of multiword expressions

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Semantic entity detection by integrating CRF and SVM

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Ontology-driven web-based semantic similarity

Journal of Intelligent Information Systems
The role of queries in ranking labeled instances extracted from text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Domain-independent entity extraction from web search query logs

Proceedings of the 20th international conference companion on World wide web
Enhanced semantic expansion for question classification

International Journal of Internet Technology and Secured Transactions
A new multiword expression metric and its applications

Journal of Computer Science and Technology - Special issue on natural language processing
Entity set expansion in opinion documents

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language models as representations for weakly-supervised NLP tasks

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Buy, sell, or hold? information extraction from stock analyst reports

CONTEXT'11 Proceedings of the 7th international and interdisciplinary conference on Modeling and using context
Training a named entity recognizer on the web

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Learning relation axioms from text: An automatic Web-based approach

Expert Systems with Applications: An International Journal
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
TwiNER: named entity recognition in targeted twitter stream

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Joint inference of named entity recognition and normalization for tweets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Linking named entities to any database

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Two-stage NER for tweets with clustering

Information Processing and Management: an International Journal
Named entity recognition for tweets

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
An automatic approach for ontology-based feature extraction from heterogeneous textualresources

Engineering Applications of Artificial Intelligence
Topic-Oriented words as features for named entity recognition

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Ontology learning: revisted

Journal of Web Engineering
Acquisition of open-domain classes via intersective semantics

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of pre-defined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method's F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when applied to complex names. The method also outperforms CMMs and CRFs by 117% on entity classes absent from the training data. Finally, our method outperforms a semi-supervised CRF by 73%.