The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

Authors:
Anja Theobald;Gerhard Weikum
Affiliations:
-;-
Venue:
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2002

Citing 19
Cited 63

Lore: a database management system for semistructured data

ACM SIGMOD Record
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A flexible model for retrieval of SGML documents

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Information retrieval algorithms: a survey

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Recognizing structure in Web pages using similarity queries

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
The Web as a graph

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Integrating keyword search into XML query processing

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Modern Information Retrieval

Modern Information Retrieval
HySpirit - A Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
A Graph-Oriented Model for Articulation of Ontology Interdependencies

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Evaluating Queries on Structure with eXtended Access Support Relations

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Adding Relevance to XML

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
XMach-1: A Benchmark for XML Data Management

Datenbanksysteme in Büro, Technik und Wissenschaft (BTW), 9. GI-Fachtagung,
Structured document storage and refined declarative and navigational access mechanisms in HyperStorM

The VLDB Journal — The International Journal on Very Large Data Bases
Using the structure of HTML documents to improve retrieval

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

The XXL search engine: ranked retrieval of XML data using indexes and ontologies

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Content and structure in indexing and ranking XML

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Concept-based querying in mediator systems

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive Processing of Top-k Queries in XML

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantic Similarity Search on Semistructured Data with the XXL Search Engine

Information Retrieval
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Information Retrieval
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
XML full-text search: challenges and opportunities

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Report on the DB/IR panel at SIGMOD 2005

ACM SIGMOD Record
A web query system for heterogeneous government data

dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
Flexible and efficient XML search with complex full-text predicates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Quark: an efficient XQuery full-text implementation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Avatar semantic search: a database approach to information retrieval

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
XQuery full-text extensions explained

IBM Systems Journal
Approximative filtering of XML documents in a publish/subscribe system

ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
An algebraic query model for effective and efficient retrieval of XML fragments

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems (TODS)
XML search: languages, INEX and scoring

ACM SIGMOD Record
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Phrase Matching in XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
COMPASS: a concept-based web search engine for HTML, XML, and deep web data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient keyword search over virtual XML views

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
NNQM: a novel non-navigating XML query model

MIV'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Multimedia, Internet & Video Technologies - Volume 7
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Enabling Schema-Free XQuery with meaningful query focus

The VLDB Journal — The International Journal on Very Large Data Bases
Usage-based ranking of distributed XML data

Proceedings of the 2008 ACM symposium on Applied computing
Order-preserving optimization of twig queries with structural preferences

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Towards an integrated framework for querying collection of heterogeneous data

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
A coherent query language for XML

Journal of Intelligent Information Systems
Efficient keyword search over virtual XML views

The VLDB Journal — The International Journal on Very Large Data Bases
Processing XML Keyword Search by Constructing Effective Structured Queries

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Binding Structural Properties to Node and Path Constraints in XML Path Retrieval

Advanced Internet Based Systems and Applications
A novel XML keyword query approach using entity subtree

Journal of Systems and Software
FVC: a feature-vector-based classification for XML dissemination

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Database and information retrieval techniques for XML

ASIAN'05 Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the web
Evaluating interconnection relationship for path-based XML retrieval

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Flexible querying of XML documents

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Effective keyword search in XML documents based on MIU

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Field-weighted XML retrieval based on BM25

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
An effective and efficient approach for keyword-based XML retrieval

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
No tag, a little nesting, and great XML keyword search

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Expressiveness and performance of full-text search languages

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
ArHeX: an approximate retrieval system for highly heterogeneous XML document collections

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Index-Based keyword search in mediator systems

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Concept-Based search on semi-structured data exploiting mined semantic relations

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Relevance feedback in XML retrieval

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
FliX: a flexible framework for indexing complex XML document collections

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
An overview of web data clustering practices

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Ranked retrieval of structured documents with the s-term vector space model

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Enhancing user interaction and efficiency with structural summaries for fast and intuitive access to XML databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Highly heterogeneous XML collections: how to retrieve precise results?

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
On the effectiveness of flexible querying heuristics for XML data

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Comparing top-k XML lists

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java servlets. Experiments with a variety of structurally diverse XML data demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.