The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents

Authors:
Jens Graupmann;Ralf Schenkel;Gerhard Weikum
Affiliations:
Max-Planck-Institut für Informatik, Germany;Max-Planck-Institut für Informatik, Germany;Max-Planck-Institut für Informatik, Germany
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 35
Cited 25

Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
An expressive and efficient language for XML information retrieval

Journal of the American Society for Information Science and Technology - XML
Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
RoadRunner: automatic data extraction from data-intensive web sites

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
WebOQL: Restructuring Documents, Databases, and Webs

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
To Weave the Web

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
BINGO!: Bookmark-Induced Gathering of Information

WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
XMach-1: A Benchmark for XML Data Management

Datenbanksysteme in Büro, Technik und Wissenschaft (BTW), 9. GI-Fachtagung,
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Querying XML using structures and keywords in timber

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Semantic Taxonomy-Based Personalizable Meta-Search Agent

WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An effective approach to document retrieval via utilizing WordNet and recognizing phrases

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Breaking through the syntax barrier: searching with entities and relations

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The Lixto data extraction project: back and forth between theory and practice

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automatic Query Refinement Using Mined Semantic Relations

WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
An algebra for structured queries in bayesian networks

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Merging XML indices

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval

Avatar semantic search: a database approach to information retrieval

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
iDM: a unified and versatile data model for personal dataspace management

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The database research group at the Max-Planck Institute for Informatics

ACM SIGMOD Record
LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces

Information Processing and Management: an International Journal
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
How NAGA uncoils: searching with entities and relations

Proceedings of the 16th international conference on World Wide Web
DB&IR: both sides now

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Indexing dataspaces

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Synthesizing structured text from logical database subsets

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Sailer: an effective search engine for unified retrieval of heterogeneous xml and web documents

Proceedings of the 17th international conference on World Wide Web
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Keyword search on external memory data graphs

Proceedings of the VLDB Endowment
Harvesting, searching, and ranking knowledge on the web: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
3se: a semi-structured search engine for heterogeneous data in graph model

Proceedings of the 18th ACM conference on Information and knowledge management
Annotating wikipedia articles with semantic tags for structured retrieval

Proceedings of the 2nd ACM workshop on Social web search and mining
Querying Wikipedia documents and relationships

Procceedings of the 13th International Workshop on the Web and Databases
An effective 3-in-1 keyword search method over heterogeneous data sources

Information Systems
Unified access to heterogeneous data in cultural heritage

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Enhancing user interaction and efficiency with structural summaries for fast and intuitive access to XML databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Chapter 3: search for knowledge

Search Computing
3SEPIAS: A Semi-Structured Search Engine for Personal Information in dAtaspace System

Information Sciences: an International Journal
On the effectiveness of flexible querying heuristics for XML data

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Pay-as-you-go maintenance of precomputed nearest neighbors in large graphs

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the novel SphereSearch Engine that provides unified ranked retrieval on heterogeneous XML and Web data. Its search capabilities include vague structure conditions, text content conditions, and relevance ranking based on IR statistics and statistically quantified ontological relationships. Web pages in HTML or PDF are automatically converted into XML format, with the option of generating semantic tags by means of linguistic annotation tools. For Web data the XML-oriented query engine is leveraged to provide very rich search options that cannot be expressed in traditional Web search engines: concept-aware and link-aware querying that takes into account the implicit structure and context of Web pages. The benefits of the SphereSearch engine are demonstrated by experiments with a large and richly tagged but non-schematic open encyclopedia extended with external documents.