A vector space model for automatic indexing
Communications of the ACM
Principled design of the modern Web architecture
ACM Transactions on Internet Technology (TOIT)
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Semantic profile-based document logistics for cooperative research
Future Generation Computer Systems - Special issue: Semantic grid and knowledge grid: the next-generation web
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Kernels and Distances for Structured Data
Machine Learning
YAWL: yet another workflow language
Information Systems
SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups
IEEE Internet Computing
SA-REST and (S)mashups: Adding Semantics to RESTful Services
ICSC '07 Proceedings of the International Conference on Semantic Computing
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Future Generation Computer Systems
Future Generation Computer Systems
Novel tools to streamline the conference review process: experiences from SIGKDD'09
ACM SIGKDD Explorations Newsletter
Design of Paper Duplicate Detection System Based on Lucene
APWCS '10 Proceedings of the 2010 Asia-Pacific Conference on Wearable Computing Systems
Web services discovery and rank: An information retrieval approach
Future Generation Computer Systems
Bioinformatics
Hi-index | 0.00 |
Scientific researchers, laboratories, organisations and research communities can be profiled and compared by analysing their published works, including documents ranging from academic papers to web sites, blog posts and Twitter feeds. This paper describes how the vector space model from information retrieval, more normally associated with full text search, has been employed in the open source SubSift software to support workflows to profile and compare such collections of documents. SubSift was originally designed to match submitted conference or journal papers to potential peer reviewers based on the similarity between the paper's abstract and the reviewer's publications as found in online bibliographic databases such as Google Scholar. The software is implemented as a family of RESTful web services that, composed into a re-useable workflow, have already been used to support several major data mining conferences. Alternative workflows and service compositions are now enabling other interesting applications, such as expert finding for the press and media, organisational profiling, and suggesting potential interdisciplinary research partners. This work is a useful generalisation and proof-of-concept realisation of an engineering solution to enable RESTful services to be assembled in workflows to analyse general content in a way that is not immediately available elsewhere. The challenges and lessons learned in the implementation and use of SubSift are discussed.