A benchmark for content-based retrieval in bivariate data collections

Authors:
Maximilian Scherer;Tatiana von Landesberger;Tobias Schreck
Affiliations:
TU Darmstadt, Darmstadt, Germany;TU Darmstadt, Darmstadt, Germany;University of Konstanz, Konstanz, Germany
Venue:
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Year:
2012

Citing 16
Cited 1

The elements of graphing data

The elements of graphing data
Efficient use of local edge histogram descriptor

MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Modern Information Retrieval

Modern Information Retrieval
PANGAEA: an information system for environmental sciences

Computers & Geosciences
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The Truth about Corel - Evaluation in Image Retrieval

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
The Princeton Shape Benchmark

SMI '04 Proceedings of the Shape Modeling International 2004
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Features for image retrieval: an experimental comparison

Information Retrieval
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Retrieval and exploratory search in multivariate research data repositories using regressional features

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A visual digital library approach for time-oriented scientific primary data

International Journal on Digital Libraries - Focused Issue on ECDL 2010

Visual-interactive querying for multivariate research data repositories using bag-of-words

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Huge amounts of various research data are produced and made publicly available in digital libraries. An important category is bivariate data (measurements of one variable versus the other). Examples of bivariate data include observations of temperature and ozone levels (e.g., in environmental observation), domestic production and unemployment (e.g., in economics), or education and income level levels (in the social sciences). For accessing these data, content-based retrieval is an important query modality. It allows researchers to search for specific relationships among data variables (e.g., quadratic dependence of temperature on altitude). However, such retrieval is to date a challenge, as it is not clear which similarity measures to apply. Various approaches have been proposed, yet no benchmarks to compare their retrieval effectiveness have been defined. In this paper, we construct a benchmark for retrieval of bivariate data. It is based on a large collection of bivariate research data. To define similarity classes, we use category information that was annotated by domain experts. The resulting similarity classes are used to compare several recently proposed content-based retrieval approaches for bivariate data, by means of precision and recall. This study is the first to present an encompassing benchmark data set and compare the performance of respective techniques. We also identify potential research directions based on the results obtained for bivariate data. The benchmark and implementations of similarity functions are made available, to foster research in this emerging area of content-based retrieval.