Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Acquisition of categorized named entities for web search
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Semi-supervised sequence modeling with syntactic topic models
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
On filtering irrelevant results in peer-to-peer search
Proceedings of the 2008 ACM symposium on Applied computing
Web opinion mining: how to extract opinions from blogs?
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
Foundations and Trends in Databases
Unsupervised Web-based Automatic Annotation
Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Turning web text and search queries into factual knowledge: hierarchical class attribute extraction
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Semi-automatic entity set refinement
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Coupling semi-supervised learning of categories and relations
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Helping editors choose better seed sets for entity set expansion
Proceedings of the 18th ACM conference on Information and knowledge management
Semantically Expanding Questions for Supervised Automatic Classification
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Injecting Structured Data to Generative Topic Model in Enterprise Settings
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Coupled semi-supervised learning for information extraction
Proceedings of the third ACM international conference on Web search and data mining
Creating a dead poets society: extracting a social network of historical persons from the web
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
From web directories to ontologies: natural language processing challenges
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Open-domain semantic role labeling by modeling word spans
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Distributional similarity vs. PU learning for entity set expansion
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Annotating large email datasets for named entity recognition with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Machine reading at the University of Washington
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Negative training data can be harmful to text classification
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Measuring the non-compositionality of multiword expressions
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Semantic entity detection by integrating CRF and SVM
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Ontology-driven web-based semantic similarity
Journal of Intelligent Information Systems
The role of queries in ranking labeled instances extracted from text
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Domain-independent entity extraction from web search query logs
Proceedings of the 20th international conference companion on World wide web
Enhanced semantic expansion for question classification
International Journal of Internet Technology and Secured Transactions
A new multiword expression metric and its applications
Journal of Computer Science and Technology - Special issue on natural language processing
Entity set expansion in opinion documents
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Recognizing named entities in tweets
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language models as representations for weakly-supervised NLP tasks
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Buy, sell, or hold? information extraction from stock analyst reports
CONTEXT'11 Proceedings of the 7th international and interdisciplinary conference on Modeling and using context
Training a named entity recognizer on the web
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Learning relation axioms from text: An automatic Web-based approach
Expert Systems with Applications: An International Journal
Named entity recognition in tweets: an experimental study
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probase: a probabilistic taxonomy for text understanding
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
TwiNER: named entity recognition in targeted twitter stream
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Joint inference of named entity recognition and normalization for tweets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Linking named entities to any database
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Two-stage NER for tweets with clustering
Information Processing and Management: an International Journal
Named entity recognition for tweets
ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
An automatic approach for ontology-based feature extraction from heterogeneous textualresources
Engineering Applications of Artificial Intelligence
Topic-Oriented words as features for named entity recognition
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Journal of Web Engineering
Acquisition of open-domain classes via intersective semantics
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of pre-defined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method's F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when applied to complex names. The method also outperforms CMMs and CRFs by 117% on entity classes absent from the training data. Finally, our method outperforms a semi-supervised CRF by 73%.