Clustering of technology tweets and the impact of stop words on clusters

  • Authors:
  • Teng-Sheng Moh;Surya Bhagvat

  • Affiliations:
  • San Jose State University, San Jose, CA;San Jose State University, San Jose, CA

  • Venue:
  • Proceedings of the 50th Annual Southeast Regional Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Twitter, which started as a means of communicating with friends, became much more than its beginning. Now, Twitter is used by companies to promote their new products and by the movie industry to promote movies. A lot of advertising and branding is now tied to Twitter. Furthermore, and most importantly, the first place one goes to find any breaking news is to search it on Twitter. The focus of this paper is clustering with the TF-IDF weighted mechanism of daily technology news tweets of prominent bloggers and news sites using Apache Mahout and to evaluate the effects of introducing and removing stop words on the quality of clustering. This project restricts itself to only tweets in the English language.