Web page analysis based on HTML DOM and its usage for forum statistics and alerts

  • Authors:
  • Robert Györödi;Cornelia Györödi;George Pecherle;George Mihai Cornea

  • Affiliations:
  • Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Romania;Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Romania;Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Romania;Department of Computer Science, Faculty of Electrical Engineering and Information Technology, University of Oradea, Romania

  • Venue:
  • ECC'10 Proceedings of the 4th conference on European computing conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Message boards are part of the Internet known as the 'Invisible Web' and pose many problems to traditional search engine spiders. The dynamic content is usually very deep and difficult to search. In addition, many of these sites change their locations, servers, or URLs almost daily creating problems with the indexing process. However, during the growth of the World Wide Web and with the help of search engines, they represent an important source of information to solve different problems. Another interesting feature of this type of webpages is that a big community has been developed, expressing different opinions and discussing various topics. Using special retrieval and indexing algorithms, mostly based on the HTML DOM tree, we have developed an algorithm to obtain detailed and accurate trend statistics that can be used for different marketing solutions and analysis tools.