Standard Mischief

Tips to add to TD’s privacy post

TD, over at Unforgiving Minute, posted a few tips for those people who “…blog anonymously/pseudonymously and want to keep it that way. ” I wanted to add a few more tips, but decided to make it into a blog entry instead. I suggest that you read his post first.

The funny thing about the Wayback Machine.

Let’s say you are a crazy-ass gun-nut or a feminist-communist anti-Catholic lacrosse-hater. You get a cushy job getting paid to blog for someone, but you need to bury ugly or unprofessional comments you have made in the past. One of the worst things going against you would be the Wayback Machine, where not only is the last search engine crawl is archived, but past versions of your web pages are saved too. Remember kids, start trying to scrub your ugly past well before you expect the major scrutiny to begin.

The really odd thing I observed with the Wayback machine is that right before every single time it serves your archived content, it seems to seek out the current site’s robots.txt file. If there’s a line in it that disallows crawling by the wayback machine, such as:

User-agent: ia_archiver # wayback machine, exclude all
Disallow: /

…then the archive will not serve the content, even if the Wayback Machine has already crawled the site. That means that while you may have to write an email and wait to have content removed from some search engines caches, like say Google, you can instantly block access from the critics by only adding two lines to your robots.txt. Remove the lines, (or comment them out), and access is instantly permitted again.

Also, that means every time that someone tries to access your archived content, the Wayback Machine leaves a little calling card in your log files. Assuming you have access to the actual logs by your hosting provider, it will probably look something like this:

208.70.29.174 - - [05/Jun/2007:20:53:50 -0700] “GET /robots.txt HTTP/1.0″ 200 537 “-” “ia_archiver-web.archive.org”

Good luck! It’s smart to remember that getting your content scrubbed off of every server and cache on teh interweb is next to impossible, but with a little skill, you could do better than the average idiot. That’s why you are using a pseudonym anyway, right?

Referer string gotcha for obsessive-compulsive Sitemeter junkies, and others.

Every time someone blogs about your blog, and someone out there follows the link to your blog, you get two important bits of info in your logfiles. You get the IP address of your visitor, and you (usually) get something called a referer string.

Here’s an example from my log:

0.0.0.0 - - [02/Jun/2007:17:49:51 -0700] “GET / HTTP/1.0″ 200 10461 “http://www.unforgivingminute.com/blog/2007/06/01/weekend-update-unfortunately-not-as-funny-as-norm-macdonalds/” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; (R1 1.5); .NET CLR 1.1.4322; .NET CLR 2.0.50727)”

There’s actually a lot of info here, but the part I’m talking about is that someone with an IP address of 0.0.0.0 (I’ve zeroed out the real IP address) came to my site (GET /) via a page at TD’s site (http://www.unforgivingminute.com/blog/2007/06/01/weekend-update-unfortunately-not-as-funny-as-norm-macdonalds/).
Now a stats package like Sitemeter will repackage the raw logs into a somewhat friendlier format.
Anyway the point I’m trying to make is that if something like this turns up in my referer logs:

0.0.0.0 - - [16/Mar/2007:00:57:34 -0700] “GET /2007/02/25/the-washington-post-spins-the-zumbo-saga-plus-a-timeline/ HTTP/1.0″ 200 10781 “http://www.sitemeter.com/?a=stats&s=s31your-sitemeter-username” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2″

…then it’s obvious that someone (likely the owner of the sitemeter userid) followed a link from his/her sitemeter page back to your blog, probably to do a little ego surfing. Likewise, if the referer string is something like:

http://standardmischief.com/wp-admin/edit-comments.php

Then you can assume that the link was followed by the blog owner from the blog’s administration page.

Once you get the IP address, you can then do a lookup on it. If you get really lucky, you may find that the person is surfing from a static IP address assigned to their workplace. As an example, that’s evidently what happened to this fine young lady, and the threat of an outing was enough to give her the vapors for a few weeks.

Just a short note here to say that we never out people for expressing there opinion, or for nearly any other reason, here in my blog. I have outed one spammer though.

Ditching your referer string.

OK, well I hope the preceding was helpful, but I’m afraid it was as clear as mud. At least I hope you understand that if you are digging through your stats, and you just click on a link, you may be blowing your cover. Here’s a simple way to get around it.

1. Instead of clicking on the link, right-click on it.
2. Select Copy Link Location, (or Copy Link Address, on some platforms).
3. Open up a new window, or open up a fresh tab, or reuse a window or tab that you no longer need.
4. Click to put the cursor up in the address bar (double-click if there is anything already in the address bar, like an old URL).
5. Press “Ctrl-V“, (or use the drop down menu for Paste or whatever), and Paste that URL into the address bar. Hit Enter.

You’ll still leave your IP address in the blog owner’s logs, but at least they will have no idea where you got that URL from.

I suppose I’ve got a bit more of the standard mischief on this topic, but perhaps I’ll save that for another post.

2007-06-06 02:00 by Standard Mischief, Filed under:deranged rants   1 Comment »

Comments

  1. TD Says :

    Hey, thanks for the linkage and for expanding on my post.

    You’re probably already aware of this, but there’s a handy-dandy little Firefox plugin to let you control your HTTP Referer strings:

    https://addons.mozilla.org/en-US/firefox/addon/953

    2007-06-06 17:49 Permalink

Leave a comment

(required)

(required)

RSS feed for comments on this post. TrackBack URL

current.png

Powered by WordPress , Theme Ported to Wordpress by Liu Xun. Original Design by Cathayan