Summaries on the fly: query-based extraction of structured knowledge from web documents

Authors:
Besnik Fetahu;Bernardo Pereira Nunes;Stefan Dietze
Affiliations:
L3S Research Center, Leibniz University Hannover, Germany;L3S Research Center, Leibniz University Hannover, Germany,Department of Informatics, PUC-Rio, Rio de Janeiro, RJ, Brazil;L3S Research Center, Leibniz University Hannover, Germany
Venue:
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Year:
2013

Citing 24
Cited 0

Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Latent dirichlet allocation

The Journal of Machine Learning Research
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
Multidocument summarization via information extraction

HLT '01 Proceedings of the first international conference on Human language technology research
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
Topic analysis for topic-focused multi-document summarization

Proceedings of the 18th ACM conference on Information and knowledge management
A multi-pass sieve for coreference resolution

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Applying wikipedia-based explicit semantic analysis for query-biased document summarization

ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
Supporting natural language processing with background knowledge: coreference resolution case

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Integrating Document Clustering and Multidocument Summarization

ACM Transactions on Knowledge Discovery from Data (TKDD)
FootbOWL: using a generic ontology of football competition for planning match summaries

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
RELIN: relatedness and informativeness-based centrality for entity summarization

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Open domain event extraction from twitter

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
LODifier: generating linked data from unstructured text

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
When did that happen?: linking events and relations to timestamps

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Open language learning for information extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large part of Web resources consists of unstructured textual content. Processing and retrieving relevant content for a particular information need is challenging for both machines and humans. While information retrieval techniques provide methods for detecting suitable resources for a particular query, information extraction techniques enable the extraction of structured data and text summarization allows the detection of important sentences. However, these techniques usually do not consider particular user interests and information needs. In this paper, we present a novel method to automatically generate structured summaries from user queries that uses POS patterns to identify relevant statements and entities in a certain context. Finally, we evaluate our work using the publicly available New York Times corpus, which shows the applicability of our method and the advantages over previous works.