Learning chinese entity attributes from online encyclopedia

Authors:
Yidong Chen;Liwei Chen;Kun Xu
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China
Venue:
APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Year:
2012

Citing 5
Cited 1

Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Actively Learning Ontology Matching via User Interaction

ISWC '09 Proceedings of the 8th International Semantic Web Conference

Towards automatic construction of knowledge bases from Chinese online resources

ACL '12 Proceedings of ACL 2012 Student Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatically constructing knowledge bases from free online encyclopedias has been considered to be a crucial step in many internet related areas. However, current research pays more attention to extract knowledge facts from English resources, and there is less work concerning other languages. In this paper, we describe an approach to extract entity attributes from a free Chinese online encyclopedia-HudongBaike. We first identified attribute-value pairs from HudongBaike pages that are featured with InfoBoxes, which in turn can be used to learn which attributes we should pay attention to for different HudongBaike entries. We then adopted a keyword matching approach to identify candidate sentences for each attribute in a plain HudongBaike article. At last, we trained a CRF model to extract corresponding values from these candidate sentences. Our approach is simple but effective, and our experiments show that it is possible to produce large amount of triples from free online encyclopedias which can be then used to construct Chinese knowledge bases with less human supervision.