Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Creating a Web community chart for navigating related communities
Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Extracting Large-Scale Knowledge Bases from the Web
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Extracting evolution of web communities from a series of web archives
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 13th international conference on World Wide Web
Sic transit gloria telae: towards an understanding of the web's decay
Proceedings of the 13th international conference on World Wide Web
Information diffusion through blogspace
Proceedings of the 13th international conference on World Wide Web
Trend detection through temporal link analysis
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Proceedings of the 9th annual ACM international workshop on Web information and data management
Genealogical trees on the web: a search engine user perspective
Proceedings of the 17th international conference on World Wide Web
Estimating the Change of Web Pages
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A three-year study on the freshness of web search engine databases
Journal of Information Science
A method for measuring the evolution of a topic on the Web: The case of “informetrics”
Journal of the American Society for Information Science and Technology
DS'07 Proceedings of the 10th international conference on Discovery science
Socio-sense: a system for analysing the societal behavior from long term web archive
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Proceedings of the 2011 ACM Symposium on Applied Computing
Fires on the web: towards efficient exploring historical web graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Noise robust detection of the emergence and spread of topics on the web
Proceedings of the 2nd Temporal Web Analytics Workshop
Hi-index | 0.00 |
Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such changes can be observed by crawling the Web periodically. In practice, however, it is impossible to crawl the entire expanding Web repeatedly. This means that the novelty of a page remains unknown, even if that page did not exist in previous snapshots. In this paper, we propose a novelty measure for estimating the certainty that a newly crawled page appeared between the previous and current crawls. Using this novelty measure, new pages can be extracted from a series of unstable snapshots for further analysis and mining to identify new trends on the Web. We evaluated the precision, recall, and miss rate of the novelty measure using our Japanese web archive, and applied it to a Web archive search engine.