Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
A system for automatic personalized tracking of scientific literature on the Web
Proceedings of the fourth ACM conference on Digital libraries
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Personalized spiders for web search and analysis
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Information navigation on the web by clustering and summarizing query results
Information Processing and Management: an International Journal
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Evaluating contents-link coupled web page clustering for web search results
Proceedings of the eleventh international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
The Web-DL environment for building digital libraries from the Web
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Digital libraries' support for the user's 'information journey'
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
What's there and what's not?: focused crawling for missing documents in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Using HMM to learn user browsing patterns for focused web crawling
Data & Knowledge Engineering - Special issue: WIDM 2004
Agreeing to disagree: search engines and their public interfaces
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
The impact of term selection in genre-aware focused crawling
Proceedings of the 2008 ACM symposium on Applied computing
Enhancing digital libraries using missing content analysis
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES
Applied Artificial Intelligence
Metadata domain-knowledge driven search engine in "HyperManyMedia" E-learning resources
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Improving the performance of focused web crawlers
Data & Knowledge Engineering
A Genre-Aware Approach to Focused Crawling
World Wide Web
Metadata as seeds for building an ontology driven information retrieval system
International Journal of Hybrid Intelligent Systems
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Beyond digital incunabula: modeling the next generation of digital libraries
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Hi-index | 0.00 |
A large amount of research, technical and professional documents are available today in digital formats Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.