Toward completeness in concept extraction and classification

Authors:
Eduard Hovy;Zornitsa Kozareva;Ellen Riloff
Affiliations:
USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;University of Utah, Salt Lake City, UT
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 16
Cited 11

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Sentiment Mining in WebFountain

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A graph model for unsupervised lexical acquisition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning semantic constraints for the automatic discovery of part-whole relations

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Fine-grained proper noun ontologies for question answering

SEMANET '02 Proceedings of the 2002 workshop on Building and using semantic networks - Volume 11
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence

Learning arguments and supertypes of semantic relations using recursive patterns

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A semi-supervised method to learn and construct taxonomies using the web

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
The role of queries in ranking labeled instances extracted from text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Class label enhancement via related instances

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Turning the web into a database: extracting data and structure

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Automatically structuring domain knowledge from text: An overview of current research

Information Processing and Management: an International Journal
No noun phrase left behind: detecting and typing unlinkable entities

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Cause-effect relation learning

TextGraphs-7 '12 Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing
Large-Scale learning of relation-extraction rules with distant supervision from the web

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Wiki3C: exploiting wikipedia for context-aware concept categorization

Proceedings of the sixth ACM international conference on Web search and data mining
Tailoring the automated construction of large-scale taxonomies using the web

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many algorithms extract terms from text together with some kind of taxonomic classification (is-a) link. However, the general approaches used today, and specifically the methods of evaluating results, exhibit serious shortcomings. Harvesting without focusing on a specific conceptual area may deliver large numbers of terms, but they are scattered over an immense concept space, making Recall judgments impossible. Regarding Precision, simply judging the correctness of terms and their individual classification links may provide high scores, but this doesn't help with the eventual assembly of terms into a single coherent taxonomy. Furthermore, since there is no correct and complete gold standard to measure against, most work invents some ad hoc evaluation measure. We present an algorithm that is more precise and complete than previous ones for identifying from web text just those concepts 'below' a given seed term. Comparing the results to WordNet, we find that the algorithm misses terms, but also that it learns many new terms not in WordNet, and that it classifies them in ways acceptable to humans but different from WordNet.