CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES

  • Authors:
  • Milos Kovacevic;Colin H. Davidson

  • Affiliations:
  • University of Belgrade, School of Civil Engineering, Belgrade, Serbia;University of Montreal, School of Architecture, Montreal, Quebec, Canada

  • Venue:
  • Applied Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Professionals and craftsmen in the construction sector make an intensive use of information in their decision-making processes but only make limited use of the abundant information that is potentially available to them, particularly on the web. Consequently, designs are impoverished, construction is defective, and innovation is delayed. To facilitate convivial access to focused information, we have developed a question-and-answer (Q-A) system (reported elsewhere). To support this system, we have developed an automated crawler that permits the establishment of a bank of relevant pages, adapted to the needs of this particular industry-user community. It is based on the machine-learning framework in which an intelligent decision unit is trained to distinguish between nontopic and informative pages. We show that standard approaches which use both positive and negative classes are sensitive to the noise in the negative class. We propose different techniques for learning without negative examples, since initially one only has limited, positive information labeled by human experts; they are evaluated. Our crawler that uses the positive examples-based learning (PEBL) framework is able to collect construction-oriented pages with high precision and discovery rate. It can also be used to build domain-specific collections of pages in different scientific or professional contexts.