Building a display of missing information in a data sieve

  • Authors:
  • Curtis E. Dyreson;Omar U. Florez

  • Affiliations:
  • Utah State University, Logan, UT, USA;Utah State University, Logan, UT, USA

  • Venue:
  • Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data sieve filters a data stream to harvest data of interest and summarizes the harvested data in a multidimensional database (MDB). To build the data sieve, a designer supplies a list of filters. Each filter consists of a filter unit and category for each dimension. The filter unit specifies a pattern (a regular expression) to match as the data stream is filtered. The filter category is the system of measurement in which occurrences of that pattern are counted or otherwise aggregated. Since filtering discards some of the data, incomplete regions within the MDB are created. The missing data complicates querying. While a query on the filtered data can be automatically analysed to determine if sufficient information has been filtered to satisfy it, a better query construction strategy is to prevent users from formulating unsatisfiable queries. To aid users in formulating only satisfiable queries, the GUI for a data sieve needs to color or otherwise display regions of complete, partially complete, and missing data. As a user constructs a query, choosing categories and units, the displayed incomplete regions shift and change, curtailing future choices. For instance, if a user selects a spatial unit of Australia, the display for a temporal category of days may need to be colored as incomplete since no filters would satisfy both selections. We describe an algorithm that uses bit strings to create and maintain the display of incomplete information in a data sieve in real-time.