Concept vector extraction from Wikipedia category network

Authors:
Masumi Shirakawa;Kotaro Nakayama;Takahiro Hara;Shojiro Nishio
Affiliations:
Osaka Univ., Suita, Osaka, Japan;Tokyo Univ., Bunkyo-ku, Tokyo, Japan;Osaka Univ., Suita, Osaka, Japan;Osaka Univ., Suita, Osaka, Japan
Venue:
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Year:
2009

Citing 8
Cited 2

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Class-based n-gram models of natural language

Computational Linguistics
WordNet: a lexical database for English

Communications of the ACM
Improving statistical language model performance with automatically generated word hierarchies

Computational Linguistics
Semantic Wikipedia

Proceedings of the 15th international conference on World Wide Web
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Classifying Web Pages by Using Knowledge Bases for Entity Retrieval

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Mining Concepts from Wikipedia for Ontology Construction

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP (Natural Language Processing) and noise data on the WWW. To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia.