The nature of statistical learning theory
The nature of statistical learning theory
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text classification using string kernels
The Journal of Machine Learning Research
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Characterization of national Web domains
ACM Transactions on Internet Technology (TOIT)
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Performance of NB and SVM classifiers in Islamic Arabic data
Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Lucene in Action, Second Edition: Covers Apache Lucene 3.0
Lucene in Action, Second Edition: Covers Apache Lucene 3.0
Topic classification of blog posts using distant supervision
Proceedings of the Workshop on Semantic Analysis in Social Media
Hi-index | 0.00 |
General web search engines, such as Google, Yahoo and Bing have been very successful information retrieval tools. However, many users with domain-specific interests are still disappointed with the responses obtained from these generic tools. This situation has motivated the creation of domain-specific search engines because they are able to offer increased accuracy with a minor maintenance and infrastructure cost. This paper introduces a method to discover domain-specific web content delimited by a country-context. This method allows a search engine to improve its accuracy for users that are interested in a domain-specific web content from a particular country. Our method is based on supervised classifiers and define country bounds for the search. To delimit the country context, our web content extraction process takes information from different sources, such as the Unified Resource locators (URLs), official government web pages, the Network Information Center (NIC) and the IP numbers reserved to the country of interest. Details of the system architecture are presented. A proof of concept was carried out using the Information and Communication Technologies (ICT) domain in the Mexican context. The testing prototype has obtained encouraging results.