Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Measuring computer performance: a practitioner's guide
Measuring computer performance: a practitioner's guide
Integrating contents and structure in text retrieval
ACM SIGMOD Record
Modern Information Retrieval
Storing and querying ordered XML using a relational database system
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data
Proceedings of the 27th International Conference on Very Large Data Bases
Dynamic maintenance of web indexes using landmarks
WWW '03 Proceedings of the 12th international conference on World Wide Web
WWW '03 Proceedings of the 12th international conference on World Wide Web
XRANK: ranked keyword search over XML documents
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ViST: a dynamic index method for querying XML data by tree structures
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
DBXplorer: A System for Keyword-Based Search over Relational Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
PRIX: Indexing And Querying XML Using Prüfer Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XSeq: an indexing infrastructure for tree pattern queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Semantic Similarity Search on Semistructured Data with the XXL Search Engine
Information Retrieval
Efficient processing of XML path queries using the disk-based F&B Index
VLDB '05 Proceedings of the 31st international conference on Very large data bases
The TEXTURE benchmark: measuring performance of text queries on a relational DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Bidirectional expansion for keyword search on graph databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient and versatile query engine for TopX search
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Theory of Relational Databases
Theory of Relational Databases
Compact reachability labeling for graph-structured data
Proceedings of the 14th ACM international conference on Information and knowledge management
B-tree indexes for high update rates
ACM SIGMOD Record
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Dual Labeling: Answering Graph Reachability Queries in Constant Time
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Compressing and searching XML data via two zips
Proceedings of the 15th international conference on World Wide Web
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Effective keyword search in relational databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Index compression is good, especially for random access
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Scalable semantic web data management using vertical partitioning
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XSeek: a semantic XML search engine using keywords
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hybrid index maintenance for contiguous inverted lists
Information Retrieval
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Falcons: searching and browsing entities on the semantic web
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
NAGA: harvesting, searching and ranking knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Faster path indexes for search in XML data
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Introduction to Information Retrieval
Introduction to Information Retrieval
RDF-3X: a RISC-style engine for RDF
Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management
Proceedings of the VLDB Endowment
Towards a theory of search queries
Proceedings of the 12th International Conference on Database Theory
Flexible query answering on graph-modeled data
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Semplore: A scalable IR approach to search the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
Using Naming Authority to Rank Data and Ontologies for Web Search
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Index compression using 64-bit words
Software—Practice & Experience
Ad-hoc object retrieval in the web of data
Proceedings of the 19th international conference on World wide web
YARS2: a federated repository for querying graph structured data from the web
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
HPRD: a high performance RDF database
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Hybrid search: effectively combining keywords and semantic searches
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Finding and ranking knowledge on the semantic web
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Structured index organizations for high-throughput text querying
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hierarchical link analysis for ranking web data
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
A node indexing scheme for web entity retrieval
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Compressed perfect embedded skip lists for quick inverted-index lookups
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
A distributional approach for terminological semantic search on the Linked Data Web
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Towards a common sense base in portuguese for the linked open data cloud
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
A novel concept-based search for the web of data using UMBEL and a fuzzy retrieval model
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Discovering and ranking new links for linked data supplier
JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Effective retrieval model for entity with multi-valued attributes: BM25MF and beyond
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Journal of Biomedical Informatics
An intelligent RDF management system with hybrid querying approach
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
More and more (semi) structured information is becoming available on the web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with the ultimate goal of making it exploitable by humans and machines alike. This article examines the shift from the traditional web document model to a web data object (entity) model and studies the challenges faced in implementing a scalable and high performance system for searching semi-structured data objects over a large heterogeneous and decentralised infrastructure. Towards this goal, we define an entity retrieval model, develop novel methodologies for supporting this model and show how to achieve a high-performance entity retrieval system. We introduce an indexing methodology for semi-structured data which offers a good compromise between query expressiveness, query processing and index maintenance compared to other approaches. We address high-performance by optimisation of the index data structure using appropriate compression techniques. Finally, we demonstrate that the resulting system can index billions of data objects and provides keyword-based as well as more advanced search interfaces for retrieving relevant data objects in sub-second time. This work has been part of the Sindice search engine project at the Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice system currently maintains more than 200 million pages downloaded from the web and is being used actively by many researchers within and outside of DERI.