Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
A risk minimization framework for information retrieval
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining linguistic and statistical analysis to extract relations from web documents
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Towards a query optimizer for text-centric tasks
ACM Transactions on Database Systems (TODS)
Autonomously semantifying wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
Foundations and Trends in Databases
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
NAGA: Searching and Ranking Knowledge
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
STAR: Steiner-Tree Approximation in Relationship Graphs
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing complex extraction programs over evolving text data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Automatic Construction of a Semantic, Domain-Independent Knowledge Base
OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Geographical classification of documents using evidence from Wikipedia
Proceedings of the 6th Workshop on Geographic Information Retrieval
Crowdsourcing systems on the World-Wide Web
Communications of the ACM
Self-supervised web search for any-k complete tuples
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Automated feature generation from structured knowledge
Proceedings of the 20th ACM international conference on Information and knowledge management
Enriching short text representation in microblog for clustering
Frontiers of Computer Science in China
Building a large scale knowledge base from chinese wiki encyclopedia
JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Pattern learning for relation extraction with a hierarchical topic model
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Clustering Wikipedia infoboxes to discover their types
Proceedings of the 21st ACM international conference on Information and knowledge management
Robust web data extraction: a novel approach based on minimum cost script edit model
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
Proceedings of the 16th International Conference on Extending Database Technology
Beyond search: Retrieving complete tuples from a text-database
Information Systems Frontiers
Hi-index | 0.02 |
This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.