Harvesting facts from textual web sources by constrained label propagation

Authors:
Yafang Wang;Bin Yang;Lizhen Qu;Marc Spaniol;Gerhard Weikum
Affiliations:
Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 21
Cited 13

Weakly-supervised relation classification for information extraction

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Relation extraction using label propagation based semi-supervised learning

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automating temporal annotation with TARSQI

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Towards temporal web search

Proceedings of the 2008 ACM symposium on Applied computing
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
New Regularized Algorithms for Transductive Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Ranking and semi-supervised classification on large scale graphs using map-reduce

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia

Proceedings of the 13th International Conference on Extending Database Technology
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Experiments in graph-based semi-supervised learning methods for class-instance acquisition

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Collective cross-document relation extraction without labelled data

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
Linked Data

Linked Data
A language modeling approach for temporal information needs

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Coupled temporal scoping of relational facts

Proceedings of the fifth ACM international conference on Web search and data mining
Extraction of temporal facts and events from Wikipedia

Proceedings of the 2nd Temporal Web Analytics Workshop
Tracking entities in web archives: the LAWA project

Proceedings of the 21st international conference companion on World Wide Web
Temporally anchored relation extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Coupling label propagation and constraints for temporal fact extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Learning constraints for consistent timeline extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Acquiring temporal constraints between relations

Proceedings of the 21st ACM international conference on Information and knowledge management
PRAVDA-live: interactive knowledge harvesting

Proceedings of the 21st ACM international conference on Information and knowledge management
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
iPark: identifying parking spaces from trajectories

Proceedings of the 16th International Conference on Extending Database Technology
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Information Retrieval for Temporal Bounding

Proceedings of the 2013 Conference on the Theory of Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

There have been major advances on automatically constructing large knowledge bases by extracting relational facts from Web and text sources. However, the world is dynamic: periodic events like sports competitions need to be interpreted with their respective timepoints, and facts such as coaching a sports team, holding political or business positions, and even marriages do not hold forever and should be augmented by their respective timespans. This paper addresses the problem of automatically harvesting temporal facts with such extended time-awareness. We employ pattern-based gathering techniques for fact candidates and construct a weighted pattern-candidate graph. Our key contribution is a system called PRAVDA based on a new kind of label propagation algorithm with a judiciously designed loss function, which iteratively processes the graph to label good temporal facts for a given set of target relations. Our experiments with online news and Wikipedia articles demonstrate the accuracy of this method.