Catriple: Extracting Triples from Wikipedia Categories

Authors:
Qiaoling Liu;Kaifeng Xu;Lei Zhang;Haofen Wang;Yong Yu;Yue Pan
Affiliations:
Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China 200240;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China 200240;IBM China Research Lab, Beijing, China 100094;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China 200240;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China 200240;IBM China Research Lab, Beijing, China 100094
Venue:
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Year:
2008

Citing 11
Cited 4

Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Ontology evaluation using wikipedia categories for browsing

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Knowledge derived from wikipedia for computing semantic relatedness

Journal of Artificial Intelligence Research
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Distinguishing between instances and classes in the wikipedia taxonomy

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications

Building ontological models from Arabic Wikipedia: a proposed hybrid approach

Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Ontology learning from text: A look back and into the future

ACM Computing Surveys (CSUR)
An evidence-based verification approach to extract entities and relations for knowledge base population

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Extracting semantic knowledge from Wikipedia category names

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about 10M triples with a 12-level confidence ranging from 47.0% to 96.4%, which cover 78.2% of Wikipedia articles. Among them, 1.27M triples have confidence of 96.4%. Applications can on demand use the triples with suitable confidence.