Towards keyword-driven analytical processing

  • Authors:
  • Ping Wu;Yannis Sismanis;Berthold Reinwald

  • Affiliations:
  • University of California, Santa Barbara, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA

  • Venue:
  • Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Gaining business insights from data has recently been the focus of research and product development. On Line-Analytical Processing (OLAP) tools provide elaborate query languages that allow users to group and aggregate data in various ways, and explore interesting trends and patterns in the data. However, the dynamic nature of today's data along with the overwhelming detail at which data is provided, make it nearly impossible to organize the data in a way that a business analyst needs for thinking about the data. In this paper, we introduce "Keyword-Driven Analytical Processing" (KDAP), which combines intuitive keyword-based search with the power of aggregation in OLAP without having to spend considerable effort in organizing the data in terms that the business analyst understands. Our design point is around a user mentality that we frequently encounter: "users don't know how to specify what they want, but they know it when they see it". We present our complete solution framework, which implements various phases from disambiguating the keyword terms to organizing and ranking the results in dynamic facets, that allow the user to explore efficiently the aggregation space. We address specific issues that analysts encounter, like joins, groupings and aggregations, and we provide efficient and scalable solutions. We show, how KDAP can handle both categorical and numerical data equally well and, finally, we demonstrate the generality and applicability of KDAP to two different aspects of OLAP, namely, finding exceptions or surprises in the data and finding bellwether regions where local aggregates are highly correlated with global aggregates, using various experiments on real data.