Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support

Authors:
Haofen Wang;Thanh Tran;Chang Liu;Linyun Fu
Affiliations:
Shanghai Jiao Tong University, Shanghai 200240, China;Institute AIFB, Universität Karlsruhe, D-76128 Karlsruhe, Germany;Shanghai Jiao Tong University, Shanghai 200240, China;Shanghai Jiao Tong University, Shanghai 200240, China
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2011

Citing 26
Cited 2

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the First International Semantic Web Conference on The Semantic Web

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Querying the Semantic Web: A Formal Approach

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A TeXQuery-based XML full-text search engine

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Swoogle: a search and metadata engine for the semantic web

Proceedings of the thirteenth ACM international conference on Information and knowledge management
An enhanced model for searching in semantic portals

WWW '05 Proceedings of the 14th international conference on World Wide Web
Probabilistic, object-oriented logics for annotation-based retrieval in digital libraries

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Quark: an efficient XQuery full-text implementation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval

IEEE Transactions on Knowledge and Data Engineering
Efficient query processing for multi-dimensionally clustered tables in DB2

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
TopX: efficient and versatile top-k query processing for semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Effective and efficient semantic web data management over DB2

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
TF-IDF uncovered: a study of theories and probabilities

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A Unified Approach to Retrieving Web Documents and Semantic Web Data

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
CE2: towards a large scale hybrid search engine with integrated ranking support

Proceedings of the 17th ACM conference on Information and knowledge management
Sindice.com: a document-oriented lookup index for open linked data

International Journal of Metadata, Semantics and Ontologies
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Semplore: A scalable IR approach to search the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Investigating the Semantic Gap through Query Log Analysis

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Hybrid search: effectively combining keywords and semantic searches

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications

Combining inverted indices and structured search for ad-hoc object retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An index for efficient semantic full-text search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal means to retrieve structured data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both textual and structured data can address more complex information needs. However, hybrid search on the large scale Web environment faces several challenges. First, there is a need for repositories that can store and index a large amount of semantic data as well as textual data in documents, and manage them in an integrated way. Second, methods for hybrid query answering are needed to exploit the data from such an integrated repository. These methods should be fast and scalable, and in particular, they shall support flexible ranking schemes to return not all but only the most relevant results. In this paper, we present CE^2, an integrated solution that leverages mature information retrieval and database technologies to support large scale hybrid search. For scalable and integrated management of data, CE^2 integrates off-the-shelf database solutions with inverted indexes. Efficient hybrid query processing is supported through novel data structures and algorithms which allow advanced ranking schemes to be tightly integrated. Furthermore, a concrete ranking scheme is proposed to take features from both textual and structured data into account. Experiments conducted on DBpedia and Wikipedia show that CE^2 can provide good performance in terms of both effectiveness and efficiency.