I am complex: cluster me, don't just rank me

  • Authors:
  • Sihem Amer-Yahia

  • Affiliations:
  • Yahoo! Research, Barcelona, Catalunya, Spain

  • Venue:
  • Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large number of online applications are built over high dimensional data. That is the case for shopping where products have several features (e.g., price and color), dating where personal profiles are described using several dimensions (e.g., physical features and political views), and entertainment (e.g., movie genre and director, restaurant ambiance and location). In addition, in some applications, items may be accompanied with qualitative data such as movie and restaurant reviews. The typical way users find items in those applications is by entering a keyword query and receiving a ranked list of relevant results. Ideally, just like in Web search, users would want to spend little time before finding a satisfactory item. In practice, due the query output size, the high dimensionality of items, and in some cases, the presence of qualitative data, users tend to spend a lot of time trying to understand correlations between item features and item quality. In this talk, I will argue that the 10-blue links experience we are used to in Web search, keywords as input - ranked list as output, is inappropriate when querying and ranking high dimensional data. I will describe two applications: exploring qualitative data and ranked querying of structured data.