Surgical gesture classification from video data

  • Authors:
  • Benjamín Béjar Haro;Luca Zappella;René Vidal

  • Affiliations:
  • Center for Imaging Science, Johns Hopkins University;Center for Imaging Science, Johns Hopkins University;Center for Imaging Science, Johns Hopkins University

  • Venue:
  • MICCAI'12 Proceedings of the 15th international conference on Medical Image Computing and Computer-Assisted Intervention - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.