Recovering "lack of words" in text categorization for item banks

  • Authors:
  • Atom Nuntiyagul;Nick Cercone;Kanlaya Naruedomkul

  • Affiliations:
  • Inst. for Innovation and Development of Learning Process, Mahidol University, Thailand.;Faculty of Computer Science, Dalhousie University, Canada;Mathematics, Faculty of Science, Mahidol University, Thailand

  • Venue:
  • COMPSAC-W'05 Proceedings of the 29th annual international conference on Computer software and applications conference
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

PKIP, Patterned Keywords in Phrase, is our foature selection approach to text categorization (TC) for item banks. An item bank is a collection of textual data in which each item consists of short sentences and has only a few relevant words for categorization. Traditional TC techniques cannot provide sufficiently accurate results because of a "lack of words" problem. PKIP improves categorization accuracy and recovers from the "lack of words" problem. Our sample item bank is the collection of Thai primary mathematics problems and we use SVM as our classifier. Classification results show that PKIP produces acceptable classification performance.