Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
An Automated Integration Approach for Semi-Structured and Structured Data
CODAS '01 Proceedings of the Third International Symposium on Cooperative Database Systems for Advanced Applications
Visualizing Real Estate Property Information on the Web
IV '99 Proceedings of the 1999 International Conference on Information Visualisation
Data Extraction from Semi-structured Web Pages by Clustering
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A Survey on Web Content Mining and Extraction of Structured and Semistructured Data
ICETET '08 Proceedings of the 2008 First International Conference on Emerging Trends in Engineering and Technology
A Framework for Extracting Information from Semi-Structured Web Data Sources
ICCIT '08 Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology - Volume 01
Handling incomplete data using evolution of imputation methods
ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Fuzzy c-means clustering of incomplete data
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
World Wide Web (WWW) is a vast source of information, the problem of information overload is more acute than ever. Due to noise in WWW, it is becoming hard to find usable information. Real-estate listings are frequently available through different real estate agencies and published on different web sites. As a consequence, differences in price and description can also be observed. At the same time, a potential buyer or renter may prefer to get the entire description of a property of interest based on the data available on different portals and if possible track the changes in price. This problem can be considered as an illustration of a wider class of problems with integrating the data from numerous semistructured web data sources. The paper investigates the way clustering algorithms can be used to identify individual real estate properties described on different portals. Clustering algorithms have been used to group the records acquired from different web sources. Both standard clustering methods have been evaluated, and a method using new distance function combining similarity of semi-structured and unstructured data has been proposed. The latter approach has allowed substantial improvement in clustering results.