False Reports About Yahoo! Blocking Googlebot On del.icio.us
21st February 2008
A recent post on web developer Colin Cochrane’s that got more attention than it deserved on Sphinn and the SitePro newsletter mistakenly claims that Yahoo! has decided to play hardball with the competition.
Over the last weekend Colin found that the robots.txt file on Yahoo!’s social bookmarking property del.icio.us blocked search engine spiders including Googlebot from crawling certain directories. The extract from the robots.txt file on del.icio.us pasted below shows the “offending” code:
User-agent: Googlebot
Allow: /
Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss
Colin then spoofed the Googlebot to see what was being delivered to the spider when it tried to access one of those pages, and found that he was being delivered a 404 error.
Yahoo! has recently tested featuring information about del.icio.us bookmarks on its search results pages. Colin, along with a bunch of other SEO enthusiasts immediately jumped to the conclusion that Yahoo! is making use of its right to prevent competitors from benefiting from del.icio.us.
What Colin and the others are overlooking is three extremely important facts:
- The directories being blocked by the robots.txt contain general administrative pages such as the “Add a URL” page, which do not need to be indexed by the search engines.
- The 404 result is delivered because the user is spoofing a search engien spider, and the del.icio.us site is most likely smart enough to detect this and therefore does not allow access to its pages in order to prevent content scraping.
- Del.icio.us, like most other websites relies on Google for traffic, and blocking the spider from accessing its content would amount to del.icio.us shooting itself in the foot!
The fastest way to check the validity of Colin’s claims is by checking if Google has been able to spider and cache any pages from del.icio.us after Colin’s original discovery…
Pages spidered by Google from del.icio.us in the past 24 hours
Clicking on the link above will immediately show that the Googlebot continues to access del.icio.us.
The Problem With SEOs
While the topic of del.icio.us blocking or allowing Googlebot is relatively minor, the buzz surrounding this post highlights a much bigger problem in the industry: the fact that a self-proclaimed pundit can cry “wolf”, and in no time a whole bunch of clueless “sheep” will run scared, turning what should have otherwise been someone’s minor error into SEO gospel.
Tags: del.icio.us, google, googlebot, seo, yahoo
Share this post via:
One Response to “False Reports About Yahoo! Blocking Googlebot On del.icio.us”
Follow comments on this post through the RSS 2.0 Feed
Add Your Comments
Please Note: We do not use nofollow, but we moderate all comments. Your comment will go live once it has been moderated. You do not need to resubmit it.
Avatars are displayed for users logged in via Facebook or can be created on Gravatar.com, and will appear whenever you leave a comment on a Gravatar-enabled blog.
Trackbacks
Trackback URL. (Right-click the link to copy shortcut / link location.)









RSS
I felt it would be prudent to respond myself.
1) The directories being blocked were not the issue. The robots.txt reference was used solely as a list of user-agents to test against.
2) If del.icio.us is serving these 404s to prevent spoofing, then it is being done to prevent proxy hijacking, not content-scraping. A content scraper could just spoof a normal Mozilla user-agent if it was worried about getting caught.
On a final note: I’m not sure at what point I became a “self-proclaimed pundit”. I simply encountered unusual behaviour from del.icio.us, did a little investigating, and wrote about what I found. People took from that what they did.