WordNet: a lexical database for English
Communications of the ACM
OCELOT: a system for summarizing Web pages
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning Algorithms for Keyphrase Extraction
Information Retrieval
Maximum entropy models for named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Machine Learning
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval
Introduction to Information Retrieval
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
A ranking approach to keyphrase extraction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
TextRunner: open information extraction on the web
NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Domain-specific keyphrase extraction
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Ad-hoc object retrieval in the web of data
Proceedings of the 19th international conference on World wide web
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Scikit-learn: Machine Learning in Python
The Journal of Machine Learning Research
Proceedings of the 21st international conference on World Wide Web
Combining inverted indices and structured search for ad-hoc object retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
TwiNER: named entity recognition in targeted twitter stream
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Unsupervised graph-based topic labelling using dbpedia
Proceedings of the sixth ACM international conference on Web search and data mining
ClausIE: clause-based open information extraction
Proceedings of the 22nd international conference on World Wide Web
Large-scale linked data integration using probabilistic reasoning and crowdsourcing
The VLDB Journal — The International Journal on Very Large Data Bases
Automatic keyphrase extraction from scientific articles
Language Resources and Evaluation
Hi-index | 0.00 |
Named Entity Recognition (NER) plays an important role in a variety of online information management tasks including text categorization, document clustering, and faceted search. While recent NER systems can achieve near-human performance on certain documents like news articles, they still remain highly domain-specific and thus cannot effectively identify entities such as original technical concepts in scientific documents. In this work, we propose novel approaches for NER on distinctive document collections (such as scientific articles) based on n-grams inspection and classification. We design and evaluate several entity recognition features---ranging from well-known part-of-speech tags to n-gram co-location statistics and decision trees---to classify candidates. In addition, we show how the use of external knowledge bases (either specific like DBLP or generic like DBPedia) can be leveraged to improve the effectiveness of NER for idiosyncratic collections. We evaluate our system on two test collections created from a set of Computer Science and Physics papers and compare it against state-of-the-art supervised methods. Experimental results show that a careful combination of the features we propose yield up to 85% NER accuracy over scientific collections and substantially outperforms state-of-the-art approaches such as those based on maximum entropy.