Classification of RSS-Formatted documents using full text similarity measures

Authors:
Katarzyna Wegrzyn-Wolska;Piotr S. Szczepaniak
Affiliations:
Ecole Supérieure d'Ingenieurs en Informatique et Génie de Télécommunication, Avon-Fontainebleau, France;Institute of Computer Science, Technical University of Lodz, Lodz, Poland
Venue:
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Year:
2005

Citing 4
Cited 7

A Design and Implementation of XML-Based Mediation Framework (XMF) for Integration of Internet Information Resources

HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 7 - Volume 7
Internet search based on text intuitionistic fuzzy similarity

Intelligent exploration of the web
Text classification using string kernels

The Journal of Machine Learning Research
Word sequence kernels

The Journal of Machine Learning Research

Identifying and characterizing public science-related fears from RSS feeds: Research Articles

Journal of the American Society for Information Science and Technology
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
A Quantitative Method for RSS Based Applications

Proceedings of the 2008 conference on Applications of Data Mining in E-Business and Finance
Fourier Domain Scoring with Document Structure Consideration

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Supervised web document classification using discrete transforms, active hypercontours and expert knowledge

WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Web textual documents scoring based on discrete transforms with fuzzy weighting

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
A behaviour network approach to support opportunity-based virtual enterprises in the internet

Multiagent and Grid Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web is enormous, unlimited and dynamically changed source of useful and varied kinds of information. The news is one of the most rapidly changing kinds of information. The departure for this paper is presentation of RSS, an useful data format used frequently by publishers of news; some statistics related to news syndication illustrate the actual situation. Then, two recently developed methods for examination of similarity of textual documents are briefly presented. Since RSS-supplied records always contain the same type of information (headlines, links, article summaries, etc.), application of methods of presented type makes their diverse applications like automatic news classification and filtering easier.