A comprehensive human computation framework: with application to image labeling

  • Authors:
  • Yang Yang;Bin B. Zhu;Rui Guo;Linjun Yang;Shipeng Li;Nenghai Yu

  • Affiliations:
  • University of Science and Technology of China, Hefei, Anhui, China;Microsoft Research Asia, Beijing, China;Beihang University, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;University of Science and Technology of China, Hefei, Anhui, China

  • Venue:
  • MM '08 Proceedings of the 16th ACM international conference on Multimedia
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Image and video labeling is important for computers to understand images and videos and for image and video search. Manual labeling is tedious and costly. Automatically image and video labeling is yet a dream. In this paper, we adopt a Web 2.0 approach to labeling images and videos efficiently: Internet users around the world are mobilized to apply their "common sense" to solve problems that are hard for today's computers, such as labeling images and videos. We first propose a general human computation framework that binds problem providers, Web sites, and Internet users together to solve large-scale common sense problems efficiently and economically. The framework addresses the technical challenges such as preventing a malicious party from attacking others, removing answers from bots, and distilling human answers to produce high-quality solutions to the problems. The framework is then applied to labeling images. Three incremental refinement stages are applied. The first stage collects candidate labels of objects in an image. The second stage refines the candidate labels using multiple choices. Synonymic labels are also correlated in this stage. To prevent bots and lazy humans from selecting all the choices, trap labels are generated automatically and intermixed with the candidate labels. Semantic distance is used to ensure that the selected trap labels would be different enough from the candidate labels so that no human users would mistakenly select the trap labels. The last stage is to ask users to locate an object given a label from a segmented image. The experimental results are also reported in this paper. They indicate that our proposed schemes can successfully remove spurious answers from bots and distill human answers to produce high-quality image labels.