Skip to main content
Industry Trends

False Reports About Yahoo! Blocking Googlebot On del.icio.us

By February 21, 2008July 30th, 20232 Comments

A recent post on web developer Colin Cochrane’s that got more attention than it deserved on Sphinn and the SitePro newsletter mistakenly claims that Yahoo! has decided to play hardball with the competition.

Over the last weekend Colin found that the robots.txt file on Yahoo!’s social bookmarking property del.icio.us blocked search engine spiders including Googlebot from crawling certain directories. The extract from the robots.txt file on del.icio.us pasted below shows the “offending” code:

User-agent: Googlebot
Allow: /
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

Colin then spoofed the Googlebot to see what was being delivered to the spider when it tried to access one of those pages, and found that he was being delivered a 404 error.

Colin’s erroneous post

Yahoo! has recently tested featuring information about del.icio.us bookmarks on its search results pages. Colin, along with a bunch of other SEO enthusiasts immediately jumped to the conclusion that Yahoo! is making use of its right to prevent competitors from benefiting from del.icio.us.

What Colin and the others are overlooking is three extremely important facts:

  1. The directories being blocked by the robots.txt contain general administrative pages such as the “Add a URL” page, which do not need to be indexed by the search engines.
  2. The 404 result is delivered because the user is spoofing a search engien spider, and the del.icio.us site is most likely smart enough to detect this and therefore does not allow access to its pages in order to prevent content scraping.
  3. Del.icio.us, like most other websites relies on Google for traffic, and blocking the spider from accessing its content would amount to del.icio.us shooting itself in the foot!

The fastest way to check the validity of Colin’s claims is by checking if Google has been able to spider and cache any pages from del.icio.us after Colin’s original discovery…

Pages spidered by Google from del.icio.us in the past 24 hours

Clicking on the link above will immediately show that the Googlebot continues to access del.icio.us.

The Problem With SEOs

While the topic of del.icio.us blocking or allowing Googlebot is relatively minor, the buzz surrounding this post highlights a much bigger problem in the industry: the fact that a self-proclaimed pundit can cry “wolf”, and in no time a whole bunch of clueless “sheep” will run scared, turning what should have otherwise been someone’s minor error into SEO gospel.