Heterogeneous web data search using relevance-based on the fly data integration

Authors:
Daniel M. Herzig;Thanh Tran
Affiliations:
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 15
Cited 5

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
Query relaxation using malleable schemas

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Example-driven design of efficient record matching queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Falcon-AO: A practical ontology matching system

Web Semantics: Science, Services and Agents on the World Wide Web
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Query rewriting and answering under constraints in data integration systems

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Semplore: A scalable IR approach to search the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
A risk minimization framework for information retrieval

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Ad-hoc object retrieval in the web of data

Proceedings of the 19th international conference on World wide web
One size does not fit all: customizing ontology alignment using user feedback

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Linked Data

Linked Data
Query relaxation for entity-relationship search

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Effective and efficient entity search in RDF data

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I

Question answering on interlinked data

Proceedings of the 22nd international conference on World Wide Web
A bottom-up, knowledge-aware approach to integrating and querying web data services

ACM Transactions on the Web (TWEB)
Repeatable and reliable semantic search evaluation

Web Semantics: Science, Services and Agents on the World Wide Web
Natural language queries over heterogeneous linked data graphs: a distributional-compositional semantics approach

Proceedings of the 19th international conference on Intelligent User Interfaces
Exploratory search framework for Web data sources

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching over heterogeneous structured data on the Web is challenging due to vocabulary and structure mismatches among different data sources. In this paper, we study two existing strategies and present a new approach to integrate additional data sources into the search process. The first strategy relies on data integration to mediate mismatches through upfront computation of mappings, based on which queries are rewritten to fit individual sources. The other extreme is keyword search, which does not require any up-front investment, but ignores structure information. Building on these strategies, we present a hybrid approach, which combines the advantages of both. Our approach does not require any upfront data integration, but also leverages the fine grained structure of the underlying data. For a structured query adhering to the vocabulary of just one source, the so-called seed query, we construct an entity relevance model (ERM), which captures the content and the structure of the seed query results. This ERM is then aligned on the fly with keyword search results retrieved from other sources and also used to rank these results. The outcome of our experiments using large-scale real-world data sets suggests that data integration leads to higher search effectiveness compared to keyword search and that our new hybrid approach consistently exceeds both strategies.