Pragmatic correlation analysis for probabilistic ranking over relational data

Authors:
Jaehui Park;Sang-Goo Lee
Affiliations:
Electronics and Telecommunications Research Institute, Daejeon, South Korea;School of Computer Science and Engineering, Seoul National University, Seoul, South Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2013

Citing 19
Cited 0

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
A belief network model for IR

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
An Approach to Integrating Query Refinement in SQL

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Query Refinement in Multimedia Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Probabilistic ranking of database query results

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Supporting personalized ranking over categorical attributes

Information Sciences: an International Journal
Finding frequent co-occurring terms in relational keyword search

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Answering approximate queries over autonomous web databases

Proceedings of the 18th international conference on World wide web
Supporting queries with imprecise constraints

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Exploiting correlation to rank database query results

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II

Quantified Score

Hi-index	12.05

Visualization

Abstract

It is widely recognized that effective ranking methods for relational data (e.g., tuples) enable users to overcome the limitations of the traditional Boolean retrieval model and the hardness of structured query writing. To determine the rank of a tuple, term frequency-based methods, such as tfxidf (term frequencyxinverse document frequency) schemes, have been commonly adopted in the literature by simply considering a tuple as a single document. However, in many cases, we have noted that tfxidf schemes may not produce effective rankings or specific orderings for relational data with categorical attributes, which is pervasive today. To support fundamental aspects of relational data, we apply the notions of correlation analysis to estimate the extent of relationships between queries and data. This paper proposes a probabilistic ranking model to exploit statistical relationships that exist in relational data of categorical attributes. Given a set of query terms, information on correlative attribute values to the query terms is used to estimate the relevance of the tuple to the query. To quantify the information, we compute the extent of the dependency between correlative attribute values on a Bayesian network. Moreover, we avoid the prohibitive cost of computing insignificant ranking features based on a limited assumption of node independence. Our probabilistic ranking model is domain-independent and leverages only data statistics without any prior knowledge such as user query logs. Experimental results show that our work improves the effectiveness of rankings for real-world datasets and has a reasonable query processing efficiency compared to related work.