On learning and evaluation of decision rules in the context of rough sets
ISMIS '86 Proceedings of the ACM SIGART international symposium on Methodologies for intelligent systems
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Copy Detection Systems for Digital Documents
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Algorithmic detection of semantic similarity
WWW '05 Proceedings of the 14th international conference on World Wide Web
Journal of the American Society for Information Science and Technology
A Dual-Method Model for Copy Detection
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
Webpage Duplicate Detection Using Combined POS and Sequence Alignment Algorithm
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 01
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Approximation algorithms for combinatorial problems
Journal of Computer and System Sciences
Hi-index | 0.00 |
The motivation behind this work stems from an earlier work where text was transformed into strings of syntactical structures and used in similarity calculations using sequence algorithm on a string generated by a POS tagger. The performance of computations was greatly affected by the size of the string which in itself is the result of the type of tags used. Generated tags range from several (minimum of nine) general ones to many more (hundreds) detailed tags. Figuring out which tags and what combination of tags affect the realization of meanings, dependencies or relationships that exist in the text is an important issue. The resulting tag set reduction using rough sets and consequently string reduction has resulted in an improved efficiency in similarity calculations between documents while maintaining the same level of accuracy. Such finding was very encouraging.