Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automated data extraction from the web with conditional models
International Journal of Business Intelligence and Data Mining
Information Extraction System Based on Hidden Markov Model
ISNN '09 Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural Networks
A simhash-based scheme for locating product information from the web
Proceedings of the Second Symposium on Information and Communication Technology
Hi-index | 0.00 |
Mining product descriptions (PDs) from e-commercial web sites is an important task in information extraction from the Web. In this paper, we propose an efficient technique for this task. The technique first discovers the set of PDs based on the measure of entropy at each internal node in the HTML tag tree. Afterwards, a set of association rules based on heuristic features is employed to filter the output and therefore enhance the precision. The experimental results of PEWeb system show that the proposed method outperforms existing automatic techniques remarkably.