The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining for hypertext: a tutorial survey
ACM SIGKDD Explorations Newsletter
Web page classification without the web page
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Mapping the Semantics of Web Text and Links
IEEE Internet Computing
Focused crawling guided by link context
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
A large-scale study of robots.txt
Proceedings of the 16th international conference on World Wide Web
Learning automata based classifier
Pattern Recognition Letters
Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums
ACM Transactions on Information Systems (TOIS)
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Benchmarking e-Government - A Comparative Review of Three International Benchmarking Studies
ICDS '09 Proceedings of the 2009 Third International Conference on Digital Society
Foundations and Trends in Information Retrieval
Scalability of findability: effective and efficient IR operations in large information networks
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A solution to the exact match on rare item searches: introducing the lost sheep algorithm
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
This paper presents an approach for automatic assessment of web sites in large scale e-Government surveys. The approach aims at supplementing and to some extent replacing human evaluation which is typically the core part of these surveys. The heart of the solution is a colony inspired algorithm, called the lost sheep, which automatically locates targeted governmental material online. The algorithm centers around classifying link texts to determine if a web page should be downloaded for further analysis. The proposed algorithm is designed to work with minimum human interaction and utilize the available resources as best possible. Using the lost sheep, the people carrying out a survey will only provide sample data for a few web sites for each type of material sought after. The algorithm will automatically locate the same type of material in the other web sites part of the survey. This way it significantly reduces the need for manual work in large scale e-Government surveys.