Finding similar academic web sites with links, bibliometric couplings and colinks

Authors:
Mike Thelwall;David Wilkinson
Affiliations:
School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK;School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK
Venue:
Information Processing and Management: an International Journal
Year:
2004

Citing 15
Cited 9

Silk from a sow's ear: extracting usable structures from the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Visualizing science by citation mapping

Journal of the American Society for Information Science
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Stable algorithms for link analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Small-world linkage and co-linkage

Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Extracting macroscopic information from Web links

Journal of the American Society for Information Science and Technology
The structure of broad topics on the web

Proceedings of the 11th international conference on World Wide Web
Hyperlink-affiliation network structure of top web sites: examining affiliates with hyperlink in Korea

Journal of the American Society for Information Science and Technology
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Self-Organization and Identification of Web Communities

Computer
Conceptualizing documentation on the web: an evaluation of different heuristic-based models for counting links between university web sites

Journal of the American Society for Information Science and Technology
Link analysis, eigenvectors and stability

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Exploiting hyperlinks to study academic Web use

Social Science Computer Review
Combining link-based and content-based methods for web document classification

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Visualizing linguistic and cultural differences using Web co-link data: Research Articles

Journal of the American Society for Information Science and Technology
Hierarchical topic segmentation of websites

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Co-citations as citation endorsements and co-links as link endorsements

Journal of Information Science
Web hyperlink patterns and the financial variables of the global banking industry

Journal of Information Science
Exploring the relationships between media and political parties through web hyperlink analysis: The case of Spain

Journal of the American Society for Information Science and Technology
Exploring Web keyword analysis as an alternative to link analysis: a multi-industry case

Scientometrics
An exploration of link-based knowledge map in academic web space

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.