A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large-scale cross-document coreference using distributed inference and hierarchical models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective graph identification
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic
ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining
Hi-index | 0.00 |
Knowledge graphs provide a powerful representation of entities and the relationships between them, but automatically constructing such graphs from noisy extractions presents numerous challenges. Knowledge graph identification (KGI) is a technique for knowledge graph construction that jointly reasons about entities, attributes and relations in the presence of uncertain inputs and ontological constraints. Although knowledge graph identification shows promise scaling to knowledge graphs built from millions of extractions, increasingly powerful extraction engines may soon require knowledge graphs built from billions of extractions. One tool for scaling is partitioning extractions to allow reasoning to occur in parallel. We explore approaches which leverage ontological information and distributional information in partitioning. We compare these techniques with hash-based approaches, and show that using a richer partitioning model that incorporates the ontology graph and distribution of extractions provides superior results. Our results demonstrate that partitioning can result in order-of-magnitude speedups without reducing model performance.