Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Unsupervised named entity classification models and their ensembles
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Extracting relations from large text collections
Extracting relations from large text collections
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Named entity recognition with character-level models
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition using hundreds of thousands of features
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Locating complex named entities in web text
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
An Approach to Web-Scale Named-Entity Disambiguation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A Query Substitution-Search Result Refinement Approach for Long Query Web Searches
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Towards the Extraction of Intelligence about Competitor from the Web
WSKS '09 Proceedings of the 2nd World Summit on the Knowledge Society: Visioning and Engineering the Knowledge Society. A Web Science Perspective
Distributed training strategies for the structured perceptron
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Large scale relation detection
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Semantic entity detection by integrating CRF and SVM
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Beyond the bag-of-words paradigm to enhance information retrieval applications
Proceedings of the Fourth International Conference on SImilarity Search and APplications
On identifying academic homepages for digital libraries
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Automatic acquisition of huge training data for bio-medical named entity recognition
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Focusing on novelty: a crawling strategy to build diverse language models
Proceedings of the 20th ACM international conference on Information and knowledge management
VAHA: verbs associate with human activity --- a study on fairy tales
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Community-based classification of noun phrases in twitter
Proceedings of the 21st ACM international conference on Information and knowledge management
Exploiting the category structure of Wikipedia for entity ranking
Artificial Intelligence
Transfer joint embedding for cross-domain named entity recognition
ACM Transactions on Information Systems (TOIS)
An approach to automatic music band member detection based on supervised learning
AMR'11 Proceedings of the 9th international conference on Adaptive Multimedia Retrieval: large-scale multimedia retrieval and evaluation
Information extraction as a filtering task
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
When speed has a price: fast information extraction using approximate algorithms
Proceedings of the VLDB Endowment
Effective named entity recognition for idiosyncratic web collections
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.02 |
Automatic recognition of named entities such as people, places, organizations, books, and movies across the entire web presents a number of challenges, both of scale and scope. Data for training general named entity recognizers is difficult to come by, and efficient machine learning methods are required once we have found hundreds of millions of labeled observations. We present an implemented system that addresses these issues, including a method for automatically generating training data, and a multi-class online classification training method that learns to recognize not only high level categories such as place and person, but also more fine-grained categories such as soccer players, birds, and universities. The resulting system gives precision and recall performance comparable to that obtained for more limited entity types in much more structured domains such as company recognition in newswire, even though web documents often lack consistent capitalization and grammatical sentence construction.