WEBlog -- Wouter's Eclectic Blog

Mon, 05 Sep 2005

Bad MSNbot

Still perusing my HTTP logs. It's incredible what one can find in more than a year's worth of HTTP logs. This one, I find less amusing. For the month of August 2005 _only_:
HitsBandwidthLast visit
MSNBot18814+71372.42 MB31 Aug 2005 - 23:54
Googlebot4798+15355.66 MB31 Aug 2005 - 23:38

MSNbot hits my server way more than google, and in doing so uses way more of my bandwidth. Not that my server can't handle this or anything, but it's still quite a lot, in comparison. I can only guess what that would mean to a site with way more content than mine.

I'm seriously considering blocking MSNbot.

User agent strings.

Whatever happened to sane user agent definitions?

I was just browsing through my HTTP logs, and found this jewel:

Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.2 (like Gecko)
(Debian package 4:3.4.2-0ubuntu0hoary2)

In other words: I'm Mozilla 5.0 (or at least I pretend I am; at least I'm somewhat compatible. In reality, I'm Konqueror 3.4 running on Linux). My parsing engine is KHTML 3.4.2, but if you don't know that, it's somewhat similar to Gecko. I was packaged by Debian, but I actually mean Ubuntu Hoary. Might I make a suggestion?

Konqueror/3.4 (Linux) KHTML/3.4.2 (Ubuntu package
4:3.4.2-0ubuntu0hoary2)

No, really. Otherwise, in ten years, we'll see stuff like...

Mozilla/5.0 (compatible; Konqueror/5.0 (but actually Frobnidz/7.2))
FrobNidzHtml/7.2.1 (based on KHTML/4.2 (like Gecko) with patches from
FrobNidz Inc.) (Debian package 5:4.2.3-0ubuntu0wanky7 rebuilt for
Knoppix 5.0) (as implemented by Microsoft for Internet Explorer 8)

Or so. Which is silly.

Update: Yes, I know what the reason for those strings is, and why they are all built like that. I just happen to think it's incredibly silly to create a User Agent string that says you're based on foo, look like bar, and implement the same specs as frobnidz. And somewhere, hidden in a corner behind everything else, your real name. That this isn't going to change any time soon (because most web admins are braindead and/or don't know their job) is nothing new—but that doesn't make it less silly.

If anything, it calls for a different solution to the problems at hand. But then, I don't know what that solution would be, so let's just stick with laughing at how bad the current solution is, mm?

Referer logs are fun

Been looking through my referer logs; specifically, the search terms people have been using when they found some page on my site. It's fun to see how people end up on my site; but sometimes, it also suggests I should add a little bit more information here and there.

It's amazing how little mail I got about those things, considering my email address is listed at the bottom of every page. No, really. If you end up at my website, and you want some more information, just ask. I won't bite, I promise. Well—except if you think you'll find weblog pron here.