Characterization of evaluation metrics in topical web crawling based on genetic algorithm

Authors:
Tao Peng;Wanli Zuo;Yilin Liu
Affiliations:
College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China
Venue:
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part II
Year:
2005

Citing 8
Cited 0

Information retrieval in the World-Wide Web: making client-based searching feasible

Selected papers of the first conference on World-Wide Web
Tackling Real-Coded Genetic Algorithms: Operators and Tools for Behavioural Analysis

Artificial Intelligence Review
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Data mining: concepts and techniques

Data mining: concepts and techniques
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. A topic driven crawler chooses the best URLs to pursue during web crawling. It is difficult to evaluate what URLs downloaded are the best. This paper presents some important metrics and an evaluation function for ranking URLs about pages relevance. We also discuss an approach to evaluate the function based on GA. GA evolving process can discover the best combination of the metrics' weights. Avoiding misleading the result by a single topic, this paper presents a method which characterization of the metrics' combination be extracted by mining frequent patterns. Extracting features adopts a novel FP-tree structure and FP-growth mining method based on FP-tree without candidate generation. The experiment shows that the performance is exciting, especially about a popular topic.