Are raw RSS feeds suitable for broad issue scanning? A science concern case study

Authors:
Mike Thelwall;Rudy Prabowo;Ruth Fairclough
Affiliations:
School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, United Kingdom;School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, United Kingdom;School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, United Kingdom
Venue:
Journal of the American Society for Information Science and Technology
Year:
2006

Citing 0
Cited 10

A comparison of feature selection methods for an evolving RSS feed corpus

Information Processing and Management: an International Journal - Special issue: Informetrics
Identifying and characterizing public science-related fears from RSS feeds: Research Articles

Journal of the American Society for Information Science and Technology
Bibliometrics to webometrics

Journal of Information Science
Investigation of the accuracy of search engine hit counts

Journal of Information Science
Google stemming mechanisms

Journal of Information Science
Detecting News Event from a Citizen Journalism Website Using Tags

AMT '09 Proceedings of the 5th International Conference on Active Media Technology
Social tags as news event detectors

Journal of Information Science
Sentiment in Twitter events

Journal of the American Society for Information Science and Technology
Characterizing web syndication behavior and content

WISE'11 Proceedings of the 12th international conference on Web information system engineering
visualRSS: a platform to mine and visualise social data from RSS feeds

ICWE'12 Proceedings of the 12th international conference on Current Trends in Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 “burst” words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue. © 2006 Wiley Periodicals, Inc.