Editorial: A topic-specific crawling strategy based on semantics similarity

Authors:
Yajun Du;Qiangqiang Pen;Zhaoqiong Gao
Affiliations:
-;-;-
Venue:
Data & Knowledge Engineering
Year:
2013

Citing 32
Cited 0

Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
Pair-Wise entity resolution: overview and challenges

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Using HMM to learn user browsing patterns for focused web crawling

Data & Knowledge Engineering - Special issue: WIDM 2004
A machine learning approach to web page filtering using content and structure analysis

Decision Support Systems
Concept similarity in Formal Concept Analysis: An information content approach

Knowledge-Based Systems
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Partially constructed knowledge for semantic query

Expert Systems with Applications: An International Journal
Many-Valued Concept Lattices for Conceptual Clustering and Information Retrieval

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Improving the performance of focused web crawlers

Data & Knowledge Engineering
SCTWC: An online semi-supervised clustering approach to topical web crawlers

Applied Soft Computing
Topic-specific crawling on the Web with the measurements of the relevancy context graph

Information Systems
Design and evaluation of improvement method on the web information navigation - A stochastic search approach

Decision Support Systems
Strategy for mining association rules for web pages based on formal concept analysis

Applied Soft Computing
OntoCrawler: A focused crawler with ontology-supported website models for information agents

Expert Systems with Applications: An International Journal
Scaling up top-K cosine similarity search

Data & Knowledge Engineering
A relational vector space model using an advanced weighting scheme for image retrieval

Information Processing and Management: an International Journal
An architecture for a focused trend parallel Web crawler with the application of clickstream analysis

Information Sciences: an International Journal
Using concept lattices for text retrieval and mining

Formal Concept Analysis
Updating broken web links: An automatic recommendation system

Information Processing and Management: an International Journal
Conceptual knowledge retrieval with FooCA: improving web search engine results with contexts and concept hierarchies

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Ontology-based concept similarity in Formal Concept Analysis

Information Sciences: an International Journal
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING

Computational Intelligence
Semantic ranking of web pages based on formal concept analysis

Journal of Systems and Software
Reprint of: The anatomy of a large-scale hypertextual web search engine

Computer Networks: The International Journal of Computer and Telecommunications Networking
Reprint of: Efficient crawling through URL ordering

Computer Networks: The International Journal of Computer and Telecommunications Networking
A new case-based classification using incremental concept lattice knowledge

Data & Knowledge Engineering
FoCUS: Learning to Crawl Web Forums

IEEE Transactions on Knowledge and Data Engineering
Review: Formal Concept Analysis in knowledge processing: A survey on models and techniques

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the Internet growing exponentially, search engines are encountering unprecedented challenges. A focused search engine selectively seeks out web pages that are relevant to user topics. Determining the best strategy to utilize a focused search is a crucial and popular research topic. At present, the rank values of unvisited web pages are computed by considering the hyperlinks (as in the PageRank algorithm), a Vector Space Model and a combination of them, and not by considering the semantic relations between the user topic and unvisited web pages. In this paper, we propose a concept context graph to store the knowledge context based on the user's history of clicked web pages and to guide a focused crawler for the next crawling. The concept context graph provides a novel semantic ranking to guide the web crawler in order to retrieve highly relevant web pages on the user's topic. By computing the concept distance and concept similarity among the concepts of the concept context graph and by matching unvisited web pages with the concept context graph, we compute the rank values of the unvisited web pages to pick out the relevant hyperlinks. Additionally, we constitute the focused crawling system, and we retrieve the precision, recall, average harvest rate, and F-measure of our proposed approach, using Breadth First, Cosine Similarity, the Link Context Graph and the Relevancy Context Graph. The results show that our proposed method outperforms other methods.