Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
New indices for text: PAT Trees and PAT arrays
Information retrieval
An introduction to parallel algorithms
An introduction to parallel algorithms
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Parallel computing (2nd ed.): theory and practice
Parallel computing (2nd ed.): theory and practice
Communications of the ACM
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
In situ generation of compressed inverted files
Journal of the American Society for Information Science
Adding compression to a full-text retrieval system
Software—Practice & Experience
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Hierarchies of indices for text searching
Information Systems
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Practical digital libraries: books, bytes, and bucks
Practical digital libraries: books, bytes, and bucks
Block addressing indices for approximate text retrieval
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Structuring and visualising the WWW by generalised similarity analysis
HYPERTEXT '97 Proceedings of the eighth ACM conference on Hypertext
Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Query performance for tightly coupled distributed digital libraries
Proceedings of the third ACM conference on Digital libraries
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Exploring the similarity space
ACM SIGIR Forum
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
ParaSite: mining structural information on the Web
Selected papers from the sixth international conference on World Wide Web
The quest for correct information on the Web: hyper search engines
Selected papers from the sixth international conference on World Wide Web
21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Automated link generation: can we do better than term repetition?
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effective document presentation with a locality-based similarity heuristic
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
ACM Transactions on Database Systems (TODS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases
ACM Transactions on Information Systems (TOIS)
Reducing the space requirement of suffix trees
Software—Practice & Experience
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Case for NOW (Networks of Workstations)
IEEE Micro
Text Compression for Dynamic Document Databases
IEEE Transactions on Knowledge and Data Engineering
XML: A Door to Automated Web Applications
IEEE Internet Computing
Querying Semistructured Heterogeneous Information
DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
Partial Answers for Unavailable Data Sources
FQAS '98 Proceedings of the Third International Conference on Flexible Query Answering Systems
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract)
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Distributed Generation of Suffix Arrays
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
On Constructing Suffix Arrays in External Memory
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Parallel Generation of Inverted Files for Distributed Text Collections
SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society
A Model for Visualizing Large Answers in WWW Retrieval
SCCC '98 Proceedings of the XVIII International Conference of the Chilean Computer Science Society
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
An Efficient Method for in Memory Construction of Suffix Arrays
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
A Fast Distributed Suffix Array Generation Algorithm
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Methodologies for Distributed Information Retrieval
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation
DCC '98 Proceedings of the Conference on Data Compression
STARTS: Stanford Protocol Proposal for Internet Retrieval and Search
STARTS: Stanford Protocol Proposal for Internet Retrieval and Search
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Inverted files for text search engines
ACM Computing Surveys (CSUR)
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Ranked document retrieval in (almost) no space
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Dual-Sorted inverted lists in practice
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Faster and smaller inverted indices with treaps
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In this chapter we present the main data structures and algorithms for searching large text collections. We emphasize inverted files, the most used index, but also review suffix arrays, which are useful in a number of specialized applications. We also cover parallel and distributed implementations of these two structures. As an example, we show how mechanisms based upon inverted files can be used to index and search the Web.