A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Accessibility of information on the Web
intelligence
Methods for measuring search engine performance over time
Journal of the American Society for Information Science and Technology
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Deriving link-context from HTML tag tree
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Finding Buying Guides with a Web Carnivore
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
SIAM Journal on Discrete Mathematics
Panorama: extending digital libraries with topical crawlers
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
New measurements for search engine evaluation proposed and tested
Information Processing and Management: an International Journal
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
What's there and what's not?: focused crawling for missing documents in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Comparing rankings of search results on the web
Information Processing and Management: an International Journal - Special issue: Infometrics
Search Engine Coverage of the OAI-PMH Corpus
IEEE Internet Computing
Random sampling from a search engine's index
Proceedings of the 15th international conference on World Wide Web
Methods for comparing rankings of search engine results
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
A study of results overlap and uniqueness among major web search engines
Information Processing and Management: an International Journal
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Augmenting OAI-PMH repository holdings using search engine APIs
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Evaluation of the NSDL and google for obtaining pedagogical resources
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Factors affecting website reconstruction from the web infrastructure
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Using neighbors to date web documents
Proceedings of the 9th annual ACM international workshop on Web information and data management
Random sampling from a search engine's index
Journal of the ACM (JACM)
Another Face of Search Engine: Web Search API's
IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Revisiting Lexical Signatures to (Re-)Discover Web Pages
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
A comparison of techniques for estimating IDF values to generate lexical signatures for the web
Proceedings of the 10th ACM workshop on Web information and data management
Journal of the American Society for Information Science and Technology
What happens when facebook is gone?
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Comparing the performance of us college football teams in the web and on the field
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Improving the performance of focused web crawlers
Data & Knowledge Engineering
A coherent measurement of web-search relevance
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Teaching web information retrieval to undergraduates
Proceedings of the 41st ACM technical symposium on Computer science education
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
How much of the web is archived?
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Rediscovering missing web pages using link neighborhood lexical signatures
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
How about micro-blogging service in China: analysis and mining on sina micro-blog
Proceedings of 1st international symposium on From digital footprints to social and community intelligence
Tagging users based on Twitter lists
International Journal of Web Engineering and Technology
Coloring based approach for matching unrooted and/or unordered trees
Pattern Recognition Letters
Carbon dating the web: estimating the age of web resources
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce different results. We provide the first in depth quantitative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API indexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed decay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months.