Web page analysis based on HTML DOM and its usage for forum statistics, alerts and geo targeted data retrieval

  • Authors:
  • Robert Györödi;Cornelia Györödi;George Pecherle;George Mihai Cornea

  • Affiliations:
  • Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Oradea, Romania;Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Oradea, Romania;Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Oradea, Romania;Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Oradea, Romania

  • Venue:
  • WSEAS Transactions on Computers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Message boards are part of the Internet known as the 'Invisible Web' and pose many problems to traditional search engine spiders. The dynamic content is usually very deep and difficult to search. In addition, many of these sites change their locations, servers, or URLs almost daily creating problems with the indexing process. However, during the growth of the World Wide Web and with the help of search engines, they represent an important source of information to solve different problems. Another interesting feature of this type of web pages is that a big community has been developed, expressing different opinions and discussing various topics. Using special retrieval and indexing algorithms, mostly based on the HTML DOM tree, we have developed an algorithm to obtain detailed and accurate trend statistics that can be used for different marketing solutions and analysis tools. Combined with the services provided by traffic ranking sites like Alexa.com, we can also provide geo targeting functionality to deliver even more accurate results to the end user, such as what percentage of the users who are visiting a certain forum is coming from a certain country.