Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Partitioning-based clustering for Web document categorization
Decision Support Systems - Special issue on WITS '97
A probabilistic model of information retrieval: development and comparative experiments Part 2
Information Processing and Management: an International Journal
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
A case study in web search using TREC algorithms
Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval
Variational Extensions to EM and Multinomial PCA
ECML '02 Proceedings of the 13th European Conference on Machine Learning
The Journal of Machine Learning Research
Statistical entity-topic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Subject metadata enrichment using statistical topic models
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Generating summary keywords for emails using topics
Proceedings of the 13th international conference on Intelligent user interfaces
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Using LDA to detect semantically incoherent documents
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Hierarchical generative biclustering for MicroRNA expression analysis
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
SemaFor: semantic document indexing using semantic forests
Proceedings of the 21st ACM international conference on Information and knowledge management
Topic-based Amharic text summarization with probabilistic latent semantic analysis
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Site-based or topic-specific search engines work with mixed success because of the general difficulty of the information retrieval task, and the lack of good link information to allow authorities to be identified. We are advocating an open source approach to the problem due to its scope and need for software components. We have adopted a topic-based search engine because it represents the next generation of capability. This paper outlines our scalable system for site-based or topic-specific search, and demonstrates the developing system on a small 250,000 document collection of EU and UN web pages.