Optimizing data analysis with a semi-structured time series database

  • Authors:
  • Ledion Bitincka;Archana Ganapathi;Stephen Sorkin;Steve Zhang

  • Affiliations:
  • Splunk Inc.;Splunk Inc.;Splunk Inc.;Splunk Inc.

  • Venue:
  • SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most modern systems generate abundant and diverse log data. With dwindling storage costs, there are fewer reasons to summarize or discard data. However, the lack of tools to efficiently store and cross-correlate heterogeneous datasets makes it tedious to mine the data for analytic insights. In this paper, we present Splunk, a semi-structured time series database that can be used to index, search and analyze massive heterogeneous datasets. We share observations, lessons and case studies from real world datasets, and demonstrate Splunk's power and flexibility for enabling insightful data mining searches.