The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Large-scale information retrieval with latent semantic indexing
Information Sciences: an International Journal
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A semidiscrete matrix decomposition for latent semantic indexing information retrieval
ACM Transactions on Information Systems (TOIS)
Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Proceedings of the ninth international conference on Information and knowledge management
Collection selection and results merging with topically organized U.S. patents and TREC data
Proceedings of the ninth international conference on Information and knowledge management
A vector space model for automatic indexing
Communications of the ACM
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Concept Decompositions for Large Sparse Text Data Using Clustering
Machine Learning
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SETS: search enhanced by topic segmentation
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Peer-to-peer information retrieval using self-organizing semantic overlay networks
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Implementation of the SMART Information Retrieval System
Implementation of the SMART Information Retrieval System
Orthogonal locality preserving indexing
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hipikat: A Project Memory for Software Development
IEEE Transactions on Software Engineering
Search strategies for scientific collaboration networks
Proceedings of the 2005 ACM workshop on Information retrieval in peer-to-peer networks
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Efficient query routing for information retrieval in semantic overlays
Proceedings of the 2006 ACM symposium on Applied computing
Information retrieval in a peer-to-peer environment
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Very sparse random projections
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Novel applications of information retrieval techniques to peer-to-peer file-sharing systems
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Enhancing Search Performance on Gnutella-Like P2P Systems
IEEE Transactions on Parallel and Distributed Systems
Survey of research towards robust peer-to-peer networks: search methods
Computer Networks: The International Journal of Computer and Telecommunications Networking
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SemreX: Efficient search in a semantic overlay for literature retrieval
Future Generation Computer Systems
Contention-based performance evaluation of multidimensional range search in peer-to-peer networks
Proceedings of the 2nd international conference on Scalable information systems
A Latent Semantic Indexing-based approach to multilingual document clustering
Decision Support Systems
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
Contention-based performance evaluation of multidimensional range search in peer-to-peer networks
Future Generation Computer Systems
Content-based search using self-organizing peer-to-peer network
SEPADS'08 Proceedings of the 7th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
GRaSP: generalized range search in peer-to-peer networks
Proceedings of the 3rd international conference on Scalable information systems
ORION - Ontology-based queRy routIng in Overlay Networks
Journal of Parallel and Distributed Computing
A protocol for self-organizing peer-to-peer network supporting content-based search
WSEAS Transactions on Information Science and Applications
On building and updating distributed LSI for p2p systems
ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
DPTree: a distributed pattern tree index for partial-match queries in peer-to-peer networks
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Towards a common framework for peer-to-peer web retrieval
From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments
SemreX: a semantic peer-to-peer system for literature documents retrieval
ASWC'06 Proceedings of the First Asian conference on The Semantic Web
Understanding and enhancing the folding-in method in latent semantic indexing
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information re-trieval with peer-to-peer technology for scalability, fault-tolerance, and low administration cost. One pioneering work along this di-rection is pSearch [32, 33]. pSearch places documents onto a peer-to- peer overlay network according to semantic vectors produced using Latent Semantic Indexing (LSI). The search cost for a query is reduced since documents related to the query are likely to be co-located on a small number of nodes. Unfortunately, because of its reliance on LSI, pSearch also inherits the limitations of LSI. (1) When the corpus is large and heterogeneous, LSI's retrieval quality is inferior to methods such as Okapi. (2) The Singular Value Decomposition (SVD) used in LSI is unscalable in terms of both memory consumption and computation time.This paper addresses the above limitations of LSI and makes the following contributions. (1) To reduce the cost of SVD, we reduce the size of its input matrix through document clustering and term selection. Our method retains the retrieval quality of LSI but is several orders of magnitude more efficient. (2) Through extensive experimentation, we found that proper normalization of semantic vectors for terms and documents improves recall by 76%. (3) To further improve retrieval quality, we use low-dimensional subvectors of semantic vectors to cluster documents in the overlay and then use Okapi to guide the search and document selection.