An efficient approach to clustering real-estate listings

  • Authors:
  • Maciej Grzenda;Deepak Thukral

  • Affiliations:
  • Warsaw University of Technology, Faculty of Mathematics and Information Science, Warszawa, Poland;TESOBE Music Pictures Ltd., Berlin, Germany

  • Venue:
  • IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

World Wide Web (WWW) is a vast source of information, the problem of information overload is more acute than ever. Due to noise in WWW, it is becoming hard to find usable information. Real-estate listings are frequently available through different real estate agencies and published on different web sites. As a consequence, differences in price and description can also be observed. At the same time, a potential buyer or renter may prefer to get the entire description of a property of interest based on the data available on different portals and if possible track the changes in price. This problem can be considered as an illustration of a wider class of problems with integrating the data from numerous semistructured web data sources. The paper investigates the way clustering algorithms can be used to identify individual real estate properties described on different portals. Clustering algorithms have been used to group the records acquired from different web sources. Both standard clustering methods have been evaluated, and a method using new distance function combining similarity of semi-structured and unstructured data has been proposed. The latter approach has allowed substantial improvement in clustering results.