Automatic knowledge extraction from documents

Authors:
J. Fan;A. Kalyanpur;D. C. Gondek;D. A. Ferrucci
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
IBM Journal of Research and Development
Year:
2012

Citing 17
Cited 15

Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
DIRT @SBT@discovery of inference rules from text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Verbnet: a broad-coverage, comprehensive verb lexicon

Verbnet: a broad-coverage, comprehensive verb lexicon
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Can we derive general world knowledge from texts?

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Large-scale extraction and use of knowledge from text

Proceedings of the fifth international conference on Knowledge capture
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Identifying interesting assertions from the web

Proceedings of the 18th ACM conference on Information and knowledge management
Semantic enrichment of text with background knowledge

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Question analysis: how watson reads a clue

IBM Journal of Research and Development
Deep parsing in Watson

IBM Journal of Research and Development
Finding needles in the haystack: search and candidate generation

IBM Journal of Research and Development
Typing candidate answers using type coercion

IBM Journal of Research and Development
Relation extraction and scoring in DeepQA

IBM Journal of Research and Development
Identifying implicit relationships

IBM Journal of Research and Development

A comparison of hard filters and soft evidence for answer typing in watson

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Explanation in computational stylometry

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Introduction to "This is Watson"

IBM Journal of Research and Development
Deep parsing in Watson

IBM Journal of Research and Development
Textual resource acquisition and engineering

IBM Journal of Research and Development
Finding needles in the haystack: search and candidate generation

IBM Journal of Research and Development
Typing candidate answers using type coercion

IBM Journal of Research and Development
Relation extraction and scoring in DeepQA

IBM Journal of Research and Development
Structured data and inference in DeepQA

IBM Journal of Research and Development
Identifying implicit relationships

IBM Journal of Research and Development
A framework for merging and ranking of answers in DeepQA

IBM Journal of Research and Development
Making Watson fast

IBM Journal of Research and Development
Mining semantics for culturomics: towards a knowledge-based approach

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Toward constructing evidence-based legal arguments using legal decision documents and machine learning

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
An investigation into the application of ensemble learning for entailment classification

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watson™. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is that shallow syntactic knowledge and its implied semantics can be easily acquired and can be used in many areas of a question-answering system. We take a two-stage approach to extract the syntactic knowledge and implied semantics. First, shallow knowledge from large collections of documents is automatically extracted. Second, additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge. In this paper, we describe in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics. We also briefly discuss the various ways extracted knowledge is used throughout the IBM DeepQA system.