Weblogs as a source for extracting general world knowledge

Authors:
Jonathan Gordon;Benjamin Van Durme;Lenhart Schubert
Affiliations:
University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA
Venue:
Proceedings of the fifth international conference on Knowledge capture
Year:
2009

Citing 4
Cited 3

Extracting and evaluating general world knowledge from the Brown corpus

HLT-NAACL-TEXTMEANING '03 Proceedings of the HLT-NAACL 2003 workshop on Text meaning - Volume 9
Can we derive general world knowledge from texts?

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Open knowledge extraction through compositional language processing

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing

Evaluation of commonsense knowledge with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Open-domain commonsense reasoning using discourse relations from a corpus of weblog stories

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Open domain knowledge extraction: inference on a web scale

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge extraction (KE) efforts have often used corpora of heavily edited writing and sources written to provide the desired knowledge (e.g., newspapers or textbooks). However, the proliferation of diverse, up-to-date, unedited writing on the Web, especially in weblogs, offers new challenges for KE tools. We describe our efforts to extract general knowledge implicit in this noisy data and examine whether such sources can be an adequate substitute for resources like Wikipedia.