Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Improving human-proceedings interaction: indexing the CHI index
CHI '95 Conference Companion on Human Factors in Computing Systems
Page and link classifications: connecting diverse resources
Proceedings of the third ACM conference on Digital libraries
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 10th international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Stable algorithms for link analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 11th international conference on World Wide Web
Information Retrieval
Modern Information Retrieval
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
ARCCHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
On scaling latent semantic indexing for large peer-to-peer systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
SimFusion: measuring similarity using unified relationship matrix
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Graphs over time: densification laws, shrinking diameters and possible explanations
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
The freshness of web search engine databases
Journal of Information Science
A framework for understanding latent semantic indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
When are links useful? experiments in text classification
ECIR'03 Proceedings of the 25th European conference on IR research
A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Identification of factors predicting clickthrough in Web searching using neural network analysis
Journal of the American Society for Information Science and Technology
Finding topic trends in digital libraries
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
A Genre-Aware Approach to Focused Crawling
World Wide Web
Design and implementation of contextual information portals
Proceedings of the 20th international conference companion on World wide web
Journal of Information Science
An exploratory study of navigating wikipedia semantically: model and application
OCSC'11 Proceedings of the 4th international conference on Online communities and social computing
An evolutionary factor analysis computation for mining website structures
Expert Systems with Applications: An International Journal
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Turn the page: automated traversal of paginated websites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
WNavis: Navigating Wikipedia semantically with an SNA-based summarization technique
Decision Support Systems
Semantic ranking of web pages based on formal concept analysis
Journal of Systems and Software
Fast dimension reduction for document classification based on imprecise spectrum analysis
Information Sciences: an International Journal
A novel shark-search algorithm for theme crawler
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Hi-index | 0.00 |
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler self-evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain-specific web documents. Our implementation presents a different approach to focused crawling and aims to overcome the limitations imposed by the need to provide initial data for training, while maintaining a high recall/precision ratio. We compare its efficiency with other well-known web information retrieval techniques.