The Knowledge Engineering Review
Hi-index | 0.00 |
A VSM algorithm for Web document classification based on an extended rough set --Tolerance Rough Set is proposed. Firstly, Web document are denoted by vector space model with terms. Then the value of term co-occurrence is made used of description of tolerance class of term, which extends the capability of term to document. Finally, Web document classification algorithm is implemented, in which the similarity between documents is described by term tolerance class. Experiments using data sets collected from two Web portals: Yahoo and Open Directory Project are conducted.