SubSift web services and workflows for profiling and comparing scientists and their published works

Authors:
Simon Price;Peter A. Flach;Sebastian Spiegler;Christopher Bailey;Nikki Rogers
Affiliations:
Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH, UK and Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Build ...;Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK;Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK;Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH, UK;Institute for Learning and Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol BS8 1HH, UK
Venue:
Future Generation Computer Systems
Year:
2013

Citing 17
Cited 0

A vector space model for automatic indexing

Communications of the ACM
Principled design of the modern Web architecture

ACM Transactions on Internet Technology (TOIT)
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Semantic profile-based document logistics for cooperative research

Future Generation Computer Systems - Special issue: Semantic grid and knowledge grid: the next-generation web
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Kernels and Distances for Structured Data

Machine Learning
YAWL: yet another workflow language

Information Systems
Building an abbreviation dictionary using a term recognition approach

Bioinformatics
SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups

IEEE Internet Computing
SA-REST and (S)mashups: Adding Semantics to RESTful Services

ICSC '07 Proceedings of the International Conference on Semantic Computing
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
The design and realisation of the Experimentmy Virtual Research Environment for social sharing of workflows

Future Generation Computer Systems
Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment

Future Generation Computer Systems
Novel tools to streamline the conference review process: experiences from SIGKDD'09

ACM SIGKDD Explorations Newsletter
Design of Paper Duplicate Detection System Based on Lucene

APWCS '10 Proceedings of the 2010 Asia-Pacific Conference on Wearable Computing Systems
Web services discovery and rank: An information retrieval approach

Future Generation Computer Systems
Text mining meets workflow

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific researchers, laboratories, organisations and research communities can be profiled and compared by analysing their published works, including documents ranging from academic papers to web sites, blog posts and Twitter feeds. This paper describes how the vector space model from information retrieval, more normally associated with full text search, has been employed in the open source SubSift software to support workflows to profile and compare such collections of documents. SubSift was originally designed to match submitted conference or journal papers to potential peer reviewers based on the similarity between the paper's abstract and the reviewer's publications as found in online bibliographic databases such as Google Scholar. The software is implemented as a family of RESTful web services that, composed into a re-useable workflow, have already been used to support several major data mining conferences. Alternative workflows and service compositions are now enabling other interesting applications, such as expert finding for the press and media, organisational profiling, and suggesting potential interdisciplinary research partners. This work is a useful generalisation and proof-of-concept realisation of an engineering solution to enable RESTful services to be assembled in workflows to analyse general content in a way that is not immediately available elsewhere. The challenges and lessons learned in the implementation and use of SubSift are discussed.