Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Constructing, organizing, and visualizing collections of topically related Web resources
ACM Transactions on Computer-Human Interaction (TOCHI)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Relational learning with statistical predicate invention: better models for hypertext
Machine Learning - Special issue on inducive logic programming
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Two-Phase Web Site Classification Based on Hidden Markov Tree Models
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Web unit mining: finding and classifying subgraphs of web pages
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Hi-index | 0.00 |
A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.