A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Wrapper induction for information extraction
Wrapper induction for information extraction
Accurately and reliably extracting data from the Web: a machine learning approach
Intelligent exploration of the web
OLERA: Semisupervised Web-Data Extraction with Visual Support
IEEE Intelligent Systems
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Extracting Web Data Using Instance-Based Learning
World Wide Web
Using web mining for discovering spatial patterns and hot spots for spatial generalization
ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
Hi-index | 0.00 |
A novel method for extracting product descriptions from ecommerce websites is presented. The algorithm consists of three major steps: (1) extracting descriptions of appropriate length from the source documents related to the search query using shallow text analysis methods; (2) assigning each of the description to one of the predefined categories by means of text classification and (3) grouping the results by a text clustering algorithm to return the descriptions found in the clusters with the highest quality. The recall and precision of the search are examined using a set of queries for laptops currently being sold in popular shopping sites. It is shown that, although the extraction method based purely on the classification and the method based purely on the clustering give acceptable results, the highest precision is achieved when using them together. It was also observed that examining about 20 first sites returned by Google is sufficient to get high quality descriptions of popular products.