The relevance of recall and precision in user evaluation
Journal of the American Society for Information Science - Special issue: relevance research
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Data and Metadata Collections for Scientific Applications
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Finding Geographic Information: Collection-Level Metadata
Geoinformatica
IEEE Transactions on Knowledge and Data Engineering
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
The relationship between IR effectiveness measures and user satisfaction
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Introduction to Information Retrieval
Introduction to Information Retrieval
A case study of distributed information retrieval architectures to index one terabyte of text
Information Processing and Management: an International Journal
How does search behavior change as search becomes more difficult?
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Indexing multi-dimensional data in a cloud system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Overview of the INEX 2009 entity ranking track
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Recovering semantics of tables on the web
Proceedings of the VLDB Endowment
Find it if you can: a game for modeling different types of web search success using interaction data
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Finding haystacks with needles: ranked search for data using geospatial and temporal characteristics
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
PIKM 2012: 5th ACM workshop for PhD students in information and knowledge management
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
For decades, scientists bemoaned the scarcity of observational data to analyze and against which to test their models. Exponential growth in data volumes from ever-cheaper environmental sensors has provided scientists with the answer to their prayers: "big data". Now, scientists face a new challenge: with terabytes, petabytes or exabytes of data at hand, stored in thousands of heterogeneous datasets, how can scientists find the datasets most relevant to their research interests? If they cannot find the data, then they may as well never have collected it; that data is lost to them. Our research addresses this challenge, using an existing scientific archive as our test-bed. We approach this problem in a new way: by adapting Information Retrieval techniques, developed for searching text documents, into the world of (primarily numeric) scientific data. We propose an approach that uses a blend of automated and "semi-curated" methods to extract metadata from large archives of scientific data. We then perform searches over the extracted metadata, returning results ranked by similarity to the query terms. We briefly describe an implementation performed at an ocean observatory to validate the proposed approach. We propose performance and scalability research to explore how continued archive growth will affect our goal of interactive response, no matter the scale.