WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data

Authors:
Andrey Balmin;Emiran Curtmola
Affiliations:
IBM Almaden Research Center;UC San Diego
Venue:
Procceedings of the 13th International Workshop on the Web and Databases
Year:
2010

Citing 23
Cited 0

Design of a browsing interface for information retrieval

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
A lattice conceptual clustering system and its application to browsing retrieval

Machine Learning
Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Towards keyword-driven analytical processing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Visualization of Heterogeneous Data

IEEE Transactions on Visualization and Computer Graphics
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Damia: data mashups for intranet applications

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Dynamic faceted search for discovery-driven analysis

Proceedings of the 17th ACM conference on Information and knowledge management
Automatic Extraction of Useful Facet Hierarchies from Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On the effectiveness of flexible querying heuristics for XML data

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user's intent. To address these limitations, we propose a system, referred to herein as WikiAnalytics, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WikiAnalytics uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WikiAnalytics provides a dynamic and intuitive interface that lets the user explore the UNL and construct homogeneous structured tables, which can be further queried and aggregated using the conventional tools.