SubSift web services and workflows for profiling and comparing scientists and their published works

  • Authors:
  • Simon Price;Peter A. Flach;Sebastian Spiegler;Christopher Bailey;Nikki Rogers

  • Affiliations:
  • Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH, UK and Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Build ...;Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK;Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK;Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH, UK;Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH, UK

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific researchers, laboratories, organisations and research communities can be profiled and compared by analysing their published works, including documents ranging from academic papers to web sites, blog posts and Twitter feeds. This paper describes how the vector space model from information retrieval, more normally associated with full text search, has been employed in the open source SubSift software to support workflows to profile and compare such collections of documents. SubSift was originally designed to match submitted conference or journal papers to potential peer reviewers based on the similarity between the paper's abstract and the reviewer's publications as found in online bibliographic databases such as Google Scholar. The software is implemented as a family of RESTful web services that, composed into a re-useable workflow, have already been used to support several major data mining conferences. Alternative workflows and service compositions are now enabling other interesting applications, such as expert finding for the press and media, organisational profiling, and suggesting potential interdisciplinary research partners. This work is a useful generalisation and proof-of-concept realisation of an engineering solution to enable RESTful services to be assembled in workflows to analyse general content in a way that is not immediately available elsewhere. The challenges and lessons learned in the implementation and use of SubSift are discussed.