A knowledge-extraction approach to identify and present verbatim quotes in free text

Authors:
Gerhard Paaß;Andre Bergholz;Anja Pilz
Affiliations:
Fraunhofer Institute Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, Germany;Fraunhofer Institute Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, Germany;Fraunhofer Institute Intelligent Analysis and Information Systems (IAIS), Schloss Birlinghoven, Germany
Venue:
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Year:
2012

Citing 5
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Unsupervised methods for determining object and relation synonyms on the web

Journal of Artificial Intelligence Research
Semantic relation extraction with kernels over typed dependency trees

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
From names to entities using thematic context distance

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In news stories verbatim quotes of persons play a very important role, as they carry reliable information about the opinion of that person concerning specific aspects. As thousands of new quotes are published every hour it is very difficult to keep track of them. In this paper we describe a set of algorithms to solve the knowledge management problem of identifying, storing and accessing verbatim quotes. We handle the verbatim quote task as a relation extraction problem from unstructured text. Using a workflow of knowledge extraction algorithms we provide the required features for the relation extraction algorithm. The central relation extraction procedures is trained using manually annotated documents. It turns out that structural grammatical information is able to improve the F-vale for verbatim quote detection to 84.1%, which is sufficient for many exploratory applications. We present the results in a smartphone app connected to a web server, which employs a number of algorithms like linkage to Wikipedia, topics extraction and search engine indices to provide a flexible access to the extracted verbatim quotes.