Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques

  • Authors:
  • Mohamed Elhadi;Amjd Al-Tobi

  • Affiliations:
  • Department of Computer Science, Sultan Qaboos University, Oman;Department of Computer Science, Sultan Qaboos University, Oman

  • Venue:
  • RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The motivation behind this work stems from an earlier work where text was transformed into strings of syntactical structures and used in similarity calculations using sequence algorithm on a string generated by a POS tagger. The performance of computations was greatly affected by the size of the string which in itself is the result of the type of tags used. Generated tags range from several (minimum of nine) general ones to many more (hundreds) detailed tags. Figuring out which tags and what combination of tags affect the realization of meanings, dependencies or relationships that exist in the text is an important issue. The resulting tag set reduction using rough sets and consequently string reduction has resulted in an improved efficiency in similarity calculations between documents while maintaining the same level of accuracy. Such finding was very encouraging.