Automatic discovery of web content related to IT in the mexican internet based on supervised classifiers

  • Authors:
  • José-Lázaro Martínez-Rodríguez;Víctor-Jesús Sosa-Sosa;Iván López-Arévalo

  • Affiliations:
  • Information Technology Laboratory at Technologic and Scientific Park TECNOTAM, CINVESTAV IPN, Cd. Victoria, Tamps., México;Information Technology Laboratory at Technologic and Scientific Park TECNOTAM, CINVESTAV IPN, Cd. Victoria, Tamps., México;Information Technology Laboratory at Technologic and Scientific Park TECNOTAM, CINVESTAV IPN, Cd. Victoria, Tamps., México

  • Venue:
  • MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

General web search engines, such as Google, Yahoo and Bing have been very successful information retrieval tools. However, many users with domain-specific interests are still disappointed with the responses obtained from these generic tools. This situation has motivated the creation of domain-specific search engines because they are able to offer increased accuracy with a minor maintenance and infrastructure cost. This paper introduces a method to discover domain-specific web content delimited by a country-context. This method allows a search engine to improve its accuracy for users that are interested in a domain-specific web content from a particular country. Our method is based on supervised classifiers and define country bounds for the search. To delimit the country context, our web content extraction process takes information from different sources, such as the Unified Resource locators (URLs), official government web pages, the Network Information Center (NIC) and the IP numbers reserved to the country of interest. Details of the system architecture are presented. A proof of concept was carried out using the Information and Communication Technologies (ICT) domain in the Mexican context. The testing prototype has obtained encouraging results.