Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the third annual conference on Autonomous Agents
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Associative Document Retrieval Techniques Using Bibliographic Information
Journal of the ACM (JACM)
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
CoBWeb A Crawler for the Brazilian Web
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Adaptive sampling for thresholding in document filtering and classification
Information Processing and Management: an International Journal
Link-based similarity measures for the classification of Web documents
Journal of the American Society for Information Science and Technology
When are links useful? experiments in text classification
ECIR'03 Proceedings of the 25th European conference on IR research
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Clustering as an approach to support the automatic definition of semantic hyperlinks
Proceedings of the eighteenth conference on Hypertext and hypermedia
Citation-based methods for personalized search in digital libraries
WISE'07 Proceedings of the 2007 international conference on Web information systems engineering
Hybrid method for personalized search in scientific digital libraries
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Hybrid method for personalized search in digital libraries
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Classifying documents with link-based bibliometric measures
Information Retrieval
Word co-occurrence features for text classification
Information Systems
Hi-index | 0.00 |
It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.