Towards automatic classification of wikipedia content

  • Authors:
  • Julian Szymański

  • Affiliations:
  • Gdańsk University of Technology, Gdańsk, Poland

  • Venue:
  • IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wikipedia - the Free Encyclopedia encounters the problem of proper classification of new articles everyday. The process of assignment of articles to categories is performed manually and it is a time consuming task. It requires knowledge about Wikipedia structure, which is beyond typical editor competence, which leads to human-caused mistakes - omitting or wrong assignments of articles to categories. The article presents application of SVM classifier for automatic classification of documents from The Free Encyclopedia. The classifier application has been tested while using two text representations: inter-documents connections (hyperlinks) and word content. The results of the performed experiments evaluated on hand crafted data show that the Wikipedia classification process can be partially automated. The proposed approach can be used for building a decision support system which suggests editors the best categories that fit new content entered to Wikipedia.