Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new approach to the minimum cut problem
Journal of the ACM (JACM)
A parallel algorithm for multilevel graph partitioning and sparse matrix ordering
Journal of Parallel and Distributed Computing
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Machine Learning
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
Information extraction challenges in managing unstructured data
ACM SIGMOD Record
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Learning and inference with constraints
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Hadoop: The Definitive Guide
Coupled semi-supervised learning for information extraction
Proceedings of the third ACM international conference on Web search and data mining
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia
Proceedings of the 13th International Conference on Extending Database Technology
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Markov Logic: An Interface Layer for Artificial Intelligence
Markov Logic: An Interface Layer for Artificial Intelligence
Find your advisor: robust knowledge gathering from the web
Procceedings of the 13th International Workshop on the Web and Databases
Modeling relations and their mentions without labeled text
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Text2Onto: a framework for ontology learning and data-driven change discovery
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Database researchers: plumbers or thinkers?
Proceedings of the 14th International Conference on Extending Database Technology
Database foundations for scalable RDF processing
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Harvesting facts from textual web sources by constrained label propagation
Proceedings of the 20th ACM international conference on Information and knowledge management
Robust disambiguation of named entities in text
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discovering and exploring relations on the web
Proceedings of the VLDB Endowment
Open language learning for information extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
PATTY: a taxonomy of relational patterns with semantic types
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Towards distributed MCMC inference in probabilistic knowledge bases
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Real-time population of knowledge bases: opportunities and challenges
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
PRAVDA-live: interactive knowledge harvesting
Proceedings of the 21st ACM international conference on Information and knowledge management
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia
Artificial Intelligence
Extracting multilingual natural-language patterns for RDF predicates
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Mind the gap: large-scale frequent sequence mining
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Open domain knowledge extraction: inference on a web scale
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Discovering and disambiguating named entities in text
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Inside YAGO2s: a transparent information extraction architecture
Proceedings of the 22nd international conference on World Wide Web companion
Autonomously reviewing and validating the knowledge base of a never-ending learning system
Proceedings of the 22nd international conference on World Wide Web companion
Journal of Biomedical Informatics
Knowledge base population and visualization using an ontology based on semantic roles
Proceedings of the 2013 workshop on Automated knowledge base construction
Integration of large scale knowledge bases using probabilistic graphical models
Proceedings of the 7th ACM international conference on Web search and data mining
Guided curation of semistructured data in collaboratively-built knowledge bases
Future Generation Computer Systems
Hi-index | 0.00 |
Harvesting relational facts from Web sources has received great attention for automatically constructing large knowledge bases. Stateof-the-art approaches combine pattern-based gathering of fact candidates with constraint-based reasoning. However, they still face major challenges regarding the trade-offs between precision, recall, and scalability. Techniques that scale well are susceptible to noisy patterns that degrade precision, while techniques that employ deep reasoning for high precision cannot cope with Web-scale data. This paper presents a scalable system, called PROSPERA, for high-quality knowledge harvesting. We propose a new notion of ngram-itemsets for richer patterns, and use MaxSat-based constraint reasoning on both the quality of patterns and the validity of fact candidates.We compute pattern-occurrence statistics for two benefits: they serve to prune the hypotheses space and to derive informative weights of clauses for the reasoner. The paper shows how to incorporate these building blocks into a scalable architecture that can parallelize all phases on a Hadoop-based distributed platform. Our experiments with the ClueWeb09 corpus include comparisons to the recent ReadTheWeb experiment. We substantially outperform these prior results in terms of recall, with the same precision, while having low run-times.