Exploratory mining of collaborative social content

  • Authors:
  • Mahashweta Das

  • Affiliations:
  • University of Texas at Arlington, Arlington, TX, USA

  • Venue:
  • Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The widespread use and growing popularity of online collaborative content sites (e.g., Yelp, Amazon, IMDB) has created rich resources for users to consult in order to make purchasing decisions on various items such as restaurants, e-commerce products, movies, etc. It has also created new opportunities for producers of such items to improve business by designing better products, composing succinct advertisement snippets and building smart personalized recommendation systems. This motivates us to develop a framework for exploratory mining of user feedback on items in collaborative content sites. Typically, the amount of user feedback associated with item(s) can easily reach hundreds or thousands of ratings, tags or reviews, resulting in an overwhelming amount of information, which users may find difficult to cope with. For example, popular restaurants listed in the review site Yelp routinely receive several thousand ratings and reviews. Moreover, most online activities involve interactions between multiple items and different users, and interpreting such complex user-item interactions becomes intractable too. My PhD research concerns developing novel data mining and exploration algorithms, that account for the above-mentioned challenges, for performing aggregate analytics over available user feedback. Our analysis goal is focused towards helping (a) content consumers make more informed judgment (e.g., if a user will enjoy eating at a particular restaurant), as well as (b) content producers conduct better business (e.g., a re-designed menu to attract more people of a certain demographic group to a restaurant). My dissertation identifies a family of mining tasks, and proposes a suite of algorithms - exact, approximation with theoretical properties, and efficient heuristics - for solving the problems. We conduct a comprehensive set of experiments on the proposed techniques over both synthetic and real data crawled from the web to validate the effectiveness of our framework.