Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language

Authors:
Yuchul Jung;Joo-Young Lee;Youngho Kim;Jaehyun Park;Sung-Hyon Myaeng;Hae-Chang Rim
Affiliations:
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, Daejeon, 305-732, Korea;Department of Computer Science and Engineering, Korea University 1, 5-ka, Anam-dong, Seongbuk-Gu, Seoul 136-701, Korea;School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, Daejeon, 305-732, Korea;Department of Computer Science and Engineering, Korea University 1, 5-ka, Anam-dong, Seongbuk-Gu, Seoul 136-701, Korea;School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, Daejeon, 305-732, Korea;Department of Computer Science and Engineering, Korea University 1, 5-ka, Anam-dong, Seongbuk-Gu, Seoul 136-701, Korea
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 5
Cited 1

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Open Mind Common Sense: Knowledge Acquisition from the General Public

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
ConceptNet — A Practical Commonsense Reasoning Tool-Kit

BT Technology Journal
Automatic WordNet mapping using word sense disambiguation

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Disambiguation based on wordnet for transliteration of arabic numerals for korean TTS

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

An interface for targeted collection of common sense knowledge using a mixture model

Proceedings of the 14th international conference on Intelligent user interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of Concept-Relation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.