Scaling-Up and Speeding-Up Video Analytics Inside Database Engine

Authors:
Qiming Chen;Meichun Hsu;Rui Liu;Weihong Wang
Affiliations:
HP Labs, Palo Alto, USA;HP Labs, Palo Alto, USA;HP Labs, Hewlett Packard Co., Beijing, China;HP Labs, Hewlett Packard Co., Beijing, China
Venue:
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Year:
2009

Citing 8
Cited 1

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
User-Defined Table Operators: Enhancing Extensibility for ORDBMS

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Transactional Model for Long-Running Activities

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Inter-Enterprise Collaborative Business Process Management

Proceedings of the 17th International Conference on Data Engineering
Experiences with MapReduce, an abstraction for large-scale computation

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Clustera: an integrated computation and data management system

Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Data-Continuous SQL Process Model

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:

Generalized UDF for analytics inside database engine

WAIM'10 Proceedings of the 11th international conference on Web-age information management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most conventional video processing platforms treat database merely as a storage engine rather than a computation engine, which causes inefficient data access and massive amount of data movement. Motivated by providing a convergent platform, we push down video processing to the database engine using User Defined Functions (UDFs). However, the existing UDF technology suffers from two major limitations. First, a UDF cannot take a set of tuples as input or as output, which restricts the modeling capability for complex applications, and the tuple-wise pipelined UDF execution often leads to inefficiency and rules out the potential for enabling data-parallel computation inside the function. Next, the UDFs coded in non-SQL language such as C, either involve hard-to-follow DBMS internal system calls for interacting with the query executor, or sacrifice performance by converting input objects to strings. To solve the above problems, we realized the notion of Relation Valued Function (RVF) in an industry-scale database engine. With tuple-set input and output, an RVF can have enhanced modeling power, efficiency and in-function data-parallel computation potential. To have RVF execution interact with the query engine efficiently, we introduced the notion of RVF invocation patterns and based on that developed RVF containers for focused system support. We have prototyped these mechanisms on the Postgres database engine, and tested their power with Support Vector Machine (SVM) classification and learning, the most widely used analytics model for video understanding. Our experience reveals the value of the proposed approach in multiple dimensions: modeling capability, efficiency, in-function data-parallelism with multi-core CPUs, as well as usability; all these are fundamental to converging data-intensive analytics and data management.