Querying big data

  • Authors:
  • Boris Novikov;Natalia Vassilieva;Anna Yarygina

  • Affiliations:
  • Saint Petersburg University;Senior Researcher, HP Labs;Saint Petersburg University

  • Venue:
  • Proceedings of the 13th International Conference on Computer Systems and Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The term "Big Data" became a buzzword and is widely used in both research and industrial worlds. Typically the concept of big data assumes a variety of different sources of information and velocity of complex analytical processing, rather than just a huge and growing volume of data. All variety, velocity, and volume create new research challenges, as nearly all techniques and tools commonly used in data processing have to be re-considered. Variety and uncertainty of big data require a mixture of exact and similarity search and grouping of complex objects based on different attributes. High-level declarative query languages are important in this context due to expressiveness and potential for optimization. In this talk we are mostly interested in an algebraic layer for complex query processing which resides between user interface (most likely, graphical) and execution engine in layered system architecture. We analyze the applicability of existing models and query languages. We describe a systematic approach to similarity handling of complex objects, simultaneous application of different similarity measures and querying paradigms, complex searching and querying, combined semi-structured and unstructured search. We introduce the adaptive abstract operations based on the concept of fuzzy set, which are needed to support uniform handling of different kinds of similarity processing. To ensure an efficient implementation, approximate algorithms with controlled quality are required to enable quality versus performance trade-off for timeliness of similarity processing. Uniform and adaptive operations enable high-level declarative definition of complex queries and provide options for optimization.