User models: theory, method, and practice
International Journal of Man-Machine Studies
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Content-based book recommending using learning for text categorization
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Semi-Automatic Wrapper Generation for Internet Information Sources
COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
Toward general-purpose learning for information extraction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Hi-index | 0.00 |
This paper presents a method of mining HTML documents into structured documents and of filtering structured documents by using both slot weighting and token weighting. The goal of a mining algorithm is to find slot-token patterns in HTML documents. In order to express user interests in structured document filtering, slot and token are considered. Our preference computation algorithm applies vector similarity and Bayesian probability to filter structured documents. The experimental results show that it is important to consider hyperlinking and unlablelling in mining HTML texts; slot and token weighting can enhance the performance of structured document filtering.