SEDA: a system for search, exploration, discovery, and analysis of XML Data

Authors:
Andrey Balmin;Latha Colby;Emiran Curtmola;Quanzhong Li;Fatma Özcan;Sharath Srinivas;Zografoula Vagena
Affiliations:
IBM Almaden Research Center;IBM Almaden Research Center;UC San Diego;IBM Almaden Research Center;IBM Almaden Research Center;University of Maryland, College Park;Micorsoft Research
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 4
Cited 0

Euclidean minimum spanning trees and bichromatic closest pairs

Discrete & Computational Geometry
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On the effectiveness of flexible querying heuristics for XML data

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search in XML repositories is a powerful tool for interactive data exploration. Much work has recently been done on making XML search aware of relationship information embedded in XML document structure, but without a clear winner in all data and query scenarios. Furthermore, due to its imprecise nature, search results cannot easily be analyzed and summarized to gain more insights into the data. We address these shortcomings with SEDA: a system for Search, Exploration, Discovery, and Analysis of XML Data. SEDA is based on a paradigm of search and user interaction to help users start with simple keyword-style querying and perform rich analysis of XML data by leveraging both the content and structure of the data. SEDA is an interactive system that allows the user to refine her query iteratively to explore the XML data and discover interesting relationships. SEDA first employs a top-k algorithm to compute the most relevant top-k answers fast, and returns tuples of nodes ranked by relevance. SEDA provides several novel data structures and techniques for efficient top-k computation over graph-structured XML data. SEDA also computes all the contexts in which the query terms are found and all the connection paths that connect the query terms in the XML data. These two summaries enable the user to refine her query by disambiguating the contexts and connections relevant to her query. With the user feedback, the system has enough information to compute all query results, not just the top-k. From the complete results, SEDA automatically deduces a star schema, which is then instantiated with the query results and augmented with additional values required for a well-defined data cube. The tables computed at this step are input into an OLAP engine for further analysis.