Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Creating a Web community chart for navigating related communities
Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Simulation Study of Language Specific Web Crawling
ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Connectivity of the Thai web graph
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Hi-index | 0.00 |
This paper proposes a novel metric for web community analysis, called language homogeneity. The language homogeneity of a community measures the ratio of web pages in a specific language within the community. This simple web community analysis can provide additional insights on the characteristics of web communities. We analyze web communities extracted from large Thai web datasets in the following aspects: (1) community size distribution, (2) similarity with a web directory, and (3) Thai language homogeneity. Interestingly, we found that most Thai web communities are linguistically homogeneous. Web pages inside the same community tend to be written in the same language. Based on these analysis results, we argue that the linguistic homogeneity of web communities can be used to enhance language specific crawling. Towards this end, we point out current limitations of a language specific crawler and suggest possible ways for exploiting communities' language homogeneity to improve the performance of language specific crawling.