Building optimal information systems automatically: configuration space exploration for biomedical information systems

Authors:
Zi Yang;Elmer Garduno;Yan Fang;Avner Maiberg;Collin McCormack;Eric Nyberg
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Sinnia, Mexico City, Mexico;Oracle Corporation, Redwood Shores, CA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;The Boeing Company, Bellevue, WA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 14
Cited 0

Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Approximating the Stochastic Knapsack Problem: The Benefit of Adaptivity

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
An efficient approximation scheme for the one-dimensional bin-packing problem

SFCS '82 Proceedings of the 23rd Annual Symposium on Foundations of Computer Science
A system for finding biological entities that satisfy certain conditions from texts

Proceedings of the 17th ACM conference on Information and knowledge management
TREC genomics special issue overview

Information Retrieval
Exploring criteria for successful query expansion in the genomic domain

Information Retrieval
A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Towards effective genomic information retrieval: The impact of query complexity and expansion strategies

Journal of Information Science
Conceptual language models for domain-specific retrieval

Information Processing and Management: an International Journal
Components for information extraction: ontology-based information extractors and generic platforms

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A cross-lingual framework for monolingual biomedical information retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A lightweight framework for reproducible parameter sweeping in information retrieval

Proceedings of the 2011 workshop on Data infrastructurEs for supporting information retrieval evaluation
DESIRE 2011: workshop on data infrastructurEs for supporting information retrieval evaluation

ACM SIGIR Forum
Exploring and predicting search task difficulty

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software frameworks which support integration and scaling of text analysis algorithms make it possible to build complex, high performance information systems for information extraction, information retrieval, and question answering; IBM's Watson is a prominent example. As the complexity and scaling of information systems become ever greater, it is much more challenging to effectively and efficiently determine which toolkits, algorithms, knowledge bases or other resources should be integrated into an information system in order to achieve a desired or optimal level of performance on a given task. This paper presents a formal representation of the space of possible system configurations, given a set of information processing components and their parameters (configuration space) and discusses algorithmic approaches to determine the optimal configuration within a given configuration space (configuration space exploration or CSE). We introduce the CSE framework, an extension to the UIMA framework which provides a general distributed solution for building and exploring configuration spaces for information systems. The CSE framework was used to implement biomedical information systems in case studies involving over a trillion different configuration combinations of components and parameter values operating on question answering tasks from the TREC Genomics. The framework automatically and efficiently evaluated different system configurations, and identified configurations that achieved better results than prior published results.