Flexible and efficient querying and ranking on hyperlinked data sources

Authors:
Ramakrishna Varadarajan;Vagelis Hristidis;Louiqa Raschid;Maria-Esther Vidal;Luis Ibáñez;Héctor Rodríguez-Drumond
Affiliations:
Florida International University, Miami, FL;Florida International University, Miami, FL;University of Maryland, College Park, MD;Universidad Simón Bolívar, Caracas, Venezuela;Universidad Simón Bolívar, Caracas, Venezuela;Universidad Simón Bolívar, Caracas, Venezuela
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 19
Cited 4

Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A query language for a Web-site management system

ACM SIGMOD Record
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
WebOQL: Restructuring Documents, Databases, and Webs

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Proximity Search in Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Comparing and aggregating rankings with ties

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
AggregateRank: bringing order to web sites

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topical link analysis for web search

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking target objects of navigational queries

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Comparing Partial Rankings

SIAM Journal on Discrete Mathematics
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Complex queries over web repositories

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Explaining and Reformulating Authority Flow Queries

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Using medians to generate consensus rankings for biological data

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Effective ranking techniques for book review retrieval based on the structural feature

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Ranking objects by following paths in entity-relationship graphs

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Towards query model integration: topology-aware, IR-inspired metrics for declarative graph querying

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been an explosion of hyperlinked data in many domains, e.g., the biological Web. Expressive query languages and effective ranking techniques are required to convert this data into browsable knowledge. We propose the Graph Information Discovery (GID) framework to support sophisticated user queries on a rich web of annotated and hyperlinked data entries, where query answers need to be ranked in terms of some customized ranking criteria, e.g., PageRank or ObjectRank. GID has a data model that includes a schema graph and a data graph, and an intuitive query interface. The GID framework allows users to easily formulate queries consisting of sequences of hard filters (selection predicates) and soft filters (ranking criteria); it can also be combined with other specialized graph query languages to enhance their ranking capabilities. GID queries have a well-defined semantics and are implemented by a set of physical operators, each of which produces a ranked result graph. We discuss rewriting opportunities to provide an efficient evaluation of GID queries. Soft filters are a key feature of GID and they are implemented using authority flow ranking techniques; these are query dependent rankings and are expensive to compute at runtime. We present approximate optimization techniques for GID soft filter queries based on the properties of random walks, and using novel path-length-bound and graph-sampling approximation techniques. We experimentally validate our optimization techniques on large biological and bibliographic datasets. Our techniques can produce high quality (Top K) answers with a savings of up to an order of magnitude, in comparison to the evaluation time for the exact solution.