Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques

Authors:
Mohamed Elhadi;Amjd Al-Tobi
Affiliations:
Department of Computer Science, Sultan Qaboos University, Oman;Department of Computer Science, Sultan Qaboos University, Oman
Venue:
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Year:
2009

Citing 12
Cited 0

On learning and evaluation of decision rules in the context of rough sets

ISMIS '86 Proceedings of the ACM SIGART international symposium on Methodologies for intelligent systems
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Copy Detection Systems for Digital Documents

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
A framework for authorship identification of online messages: Writing-style features and classification techniques

Journal of the American Society for Information Science and Technology
A Dual-Method Model for Copy Detection

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Computational methods in authorship attribution

Journal of the American Society for Information Science and Technology
Webpage Duplicate Detection Using Combined POS and Sequence Alignment Algorithm

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 01
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Approximation algorithms for combinatorial problems

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

The motivation behind this work stems from an earlier work where text was transformed into strings of syntactical structures and used in similarity calculations using sequence algorithm on a string generated by a POS tagger. The performance of computations was greatly affected by the size of the string which in itself is the result of the type of tags used. Generated tags range from several (minimum of nine) general ones to many more (hundreds) detailed tags. Figuring out which tags and what combination of tags affect the realization of meanings, dependencies or relationships that exist in the text is an important issue. The resulting tag set reduction using rough sets and consequently string reduction has resulted in an improved efficiency in similarity calculations between documents while maintaining the same level of accuracy. Such finding was very encouraging.