A comparative study of citations and links in document classification
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
User-assisted similarity estimation for searching related web pages
Proceedings of the eighteenth conference on Hypertext and hypermedia
Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations
DS '08 Proceedings of the 11th International Conference on Discovery Science
Intelligent hybrid approach to false identity detection
Proceedings of the 12th International Conference on Artificial Intelligence and Law
Hybrid clustering for validation and improvement of subject-classification schemes
Information Processing and Management: an International Journal
Journal of Management Information Systems
Fuzzy Sets and Rough Sets for Scenario Modelling and Analysis
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Semi-supervised OWA aggregation for link-based similarity evaluation and alias detection
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Revisit of nearest neighbor test for direct evaluation of inter-document similarities
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Improving annotation categorization performance through integrated social annotation computation
Expert Systems with Applications: An International Journal
Classifying documents with link-based bibliometric measures
Information Retrieval
Disclosing false identity through hybrid link analysis
Artificial Intelligence and Law
Use of Medical Subject Headings (MeSH) in Portuguese for categorizing web-based healthcare content
Journal of Biomedical Informatics
Journal of Information Science
Combination of document structure and links for multimedia object retrieval
Journal of Information Science
QUBiC: An adaptive approach to query-based recommendation
Journal of Intelligent Information Systems
Pairwise similarity for cluster ensemble problem: link-based and approximate approaches
Transactions on Large-Scale Data- and Knowledge-centered systems IX
Hi-index | 0.00 |
Traditional text-based document classifiers tend to perform poorly on the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed on a Web directory show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional text-based classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines on how link structure can be used effectively to classify Web documents. © 2006 Wiley Periodicals, Inc.