Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Tackling Real-Coded Genetic Algorithms: Operators and Tools for Behavioural Analysis
Artificial Intelligence Review
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Data mining: concepts and techniques
Data mining: concepts and techniques
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Hi-index | 0.00 |
Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. A topic driven crawler chooses the best URLs to pursue during web crawling. It is difficult to evaluate what URLs downloaded are the best. This paper presents some important metrics and an evaluation function for ranking URLs about pages relevance. We also discuss an approach to evaluate the function based on GA. GA evolving process can discover the best combination of the metrics' weights. Avoiding misleading the result by a single topic, this paper presents a method which characterization of the metrics' combination be extracted by mining frequent patterns. Extracting features adopts a novel FP-tree structure and FP-growth mining method based on FP-tree without candidate generation. The experiment shows that the performance is exciting, especially about a popular topic.