Extracting and evaluating general world knowledge from the Brown corpus

Authors:
Lenhart Schubert;Matthew Tong
Affiliations:
University of Rochester;University of Rochester
Venue:
HLT-NAACL-TEXTMEANING '03 Proceedings of the HLT-NAACL 2003 workshop on Text meaning - Volume 9
Year:
2003

Citing 11
Cited 17

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A class-based approach to lexical discovery

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Acquisition of selectional patterns

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Closed yesterday and closed minds: asking the right questions of the corpus to distinguish thematic from sentential relations

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Tagging for learning: collecting thematic relations from corpus

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 1
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Semantic classes and syntactic ambiguity

HLT '93 Proceedings of the workshop on Human Language Technology
Learning class-to-class selectional preferences

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Can we derive general world knowledge from texts?

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Strategies for lifelong knowledge extraction from the web

Proceedings of the 4th international conference on Knowledge capture
Novel semantic features for verb sense disambiguation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Turing's dream and the knowledge challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Large-scale extraction and use of knowledge from text

Proceedings of the fifth international conference on Knowledge capture
Weblogs as a source for extracting general world knowledge

Proceedings of the fifth international conference on Knowledge capture
Deriving generalized knowledge from corpora using WordNet abstraction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
ASKNet: automated semantic knowledge network

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Open knowledge extraction through compositional language processing

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Evaluation of commonsense knowledge with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Open-domain commonsense reasoning using discourse relations from a corpus of weblog stories

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Evaluating commonsense knowledge with a computer game

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part I
ACTraversal: ranking crowdsourced commonsense assertions and certifications

PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice
Capability modeling of knowledge-based agents for commonsense knowledge integration

PRIMA'11 Proceedings of the 14th international conference on Agents in Principle, Agents in Practice
Bootstrapping a Game with a Purpose for Commonsense Collection

ACM Transactions on Intelligent Systems and Technology (TIST)
A study of the knowledge base requirements for passing an elementary science test

Proceedings of the 2013 workshop on Automated knowledge base construction
Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have been developing techniques for extracting general world knowledge from miscellaneous texts by a process of approximate interpretation and abstraction, focusing initially on the Brown corpus. We apply interpretive rules to clausal patterns and patterns of modification, and concurrently abstract general "possibilistic" propositions from the resulting formulas. Two examples are "A person may believe a proposition", and "Children may live with relatives". Our methods currently yield over 117,000 such propositions (of variable quality) for the Brown corpus (more than 2 per sentence). We report here on our efforts to evaluate these results with a judging scheme aimed at determining how many of these propositions pass muster as "reasonable general claims" about the world in the opinion of human judges. We find that nearly 60% of the extracted propositions are favorably judged according to our scheme by any given judge. The percentage unanimously judged to be reasonable claims by multiple judges is lower, but still sufficiently high to suggest that our techniques may be of some use in tackling the long-standing "knowledge acquisition bottleneck" in AI.