What is your robots.txt file telling your competitors about you?
Feb 12, 2008 Reputation Management, Search Engine Optimization
Have you ever thought about your robots.txt file, beyond how the various crawlers interact with it? Chances are that if you have one, you probably haven’t looked at it in since the day you created it. Well, it is time you take a fresh look at it and see how it looks not just to a bot’s eyes, but look at it through the eyes of a competitor.
You would be surprised at the number of sites and companies who use their robots.txt file as a way to keep bots out of certain directories on their site, but not considering the fact they have just pretty much handed the keys to those private areas over to their competitors. How? Because many people create their robots.txt file thinking that if the bots aren’t indexing those pages, no one will find it… but when you include those directories in your robots.txt file, you are telling real people exactly where those directories are. And surprisingly, many of those “secret” directories allow competitors to access it without requiring any kind of authentication or password.
Another thing people often give away in their robots.txt file is what they are working on adding to the site, even if they haven’t officially announced it yet. And of course, since webmasters are thinking future SEO value even in pre-launch, the directory names are always very telling because they are often keyword rich. One client was able to launch an entire section to their site because they noticed their competitor was doing something very similar when the directory popped up in the competitor’s robots.txt file but before they had either announced it or officially added it to the site. And like any good businessperson, my client beat them to the punch and launched it first. If only that competitor hadn’t jumped the gun by robots.txt-ing it, my client would have been none the wiser until launch day.
When you are working on something that you plan to launch in the future - especially launch with a splash to get publicity on it, don’t put it in your robots.txt file… all you need is a competitor to come along and see your robots.txt entry to discover what you are working on (and worse yet, if you end up having that directory open so they can spy on all your work-in-progress) and then launch their own copy of it first. Ideally, it should be completely password protected or done on a test domain that isn’t connected to your site.
Don’t want to worry about what your competitors are seeing in your robots.txt file? Carefully look at it and see what it is revealing. Do you have a super secret directory listed on there? Remove it from the robots.txt file, password protect those pages and add the no robots meta tag on those pages for good measure.
Are you working on something new for the site but haven’t announced or launched it yet? Again, password protect it or move it to a completely different unassociated domain you can lock down with robots.txt (since hopefully your competitor won’t know about that site). Or the best solution is to simply leave it offline except for brief testing periods pre-launch. This means if you want to check how it is working online, you upload it for only the amount of time you need to see it, and then delete it off the server.
Play it smart with your robots.txt file so you don’t inadvertantly hand your competitors the keys to your site or give them the edge up by alerting them to what you are working on. And for fun, check out some of your competitor’s robots.txt files… you can usually find something interesting on at least one or two of them.
Subscribe to my RSS feed











February 12th, 2008 at 7:43 pm
[...] you don’t want to leave yourself open to the world. Jennifer Slegg just posted on this over on her blog. Take a look, it’s a good [...]
February 12th, 2008 at 7:44 pm
I have to agree on this one. If you run HackerSafe or a similar type tool across your site to check for security holes they often mention this as a warning.
Assuming you are not showing indexes in your Apache config, you could always drop an hidden file in there that nobody would guess and your stuff will stay pretty secure for example if your robots.txt said:
Disallow: /noindex/
Then, inside /noindex/ you created a directory called releaseV2.343.21 and had your new stuff in there, it is 99.99% safe because they would have to guess that directory name, and then also guess the files beneath that before they could see anything.
Just another option for those who are lazy and/or do not want to password protect a directory.
I have found some interesting data in the robots.txt file about SEO firms before as well (read post).
- Jim
February 12th, 2008 at 11:29 pm
There has also been the whole issue of cloaking your robots.txt file so that the bots get the real one while real people get something else. But then that brings up the whole “cloaking is bad when you serve up something different to people than the bots” issue, so people tend to not do it.
February 13th, 2008 at 4:39 am
Since you don’t need to specify the full URL in robots.txt, just enough to match the path, I often use just the first three letters, or so, in the robots.txt file.
For example, the scripts folder is disallowed with:
Disallow: /scr
even though it might actually be called /script or /scripts or /script3.
February 13th, 2008 at 11:14 am
Now THAT is some insight. As always, the white hats will have to decide if snooping around competitors’ robots.txt files constitutes a breach of ethics–but I could venture a guess… And it’s good defensive practice anyway.
February 13th, 2008 at 12:41 pm
I don’t think seeing what is in a competitor’s robots.txt file is breaching ethics… after all, it is a publicly accessible file on the site, as would be any page on a website unless it is password protected or required authentication somehow. However, it would be a personal decision if you would go to a new directory you spied in the robots.txt to snoop what was there, keeping in mind that the competitor could potentially track it back to you if you did view the “secret but unprotected” directory without a proxy.
February 14th, 2008 at 1:46 am
I would hope that professional webmasters would do all of their development and testing on private networks.
A linux server with Apache, MySql etc costs peanuts and means you can test your sites with reduced security, greater debugging output and automated tools. Even if you need to show off your site designs to remote customers, you can do that over VPN.
Remember - when you’ve put something on the public internet, you can’t undo that.
February 14th, 2008 at 7:25 am
Andy, I bet you there are a ton of small webmasters, particularly the self-taught DIY ones whose skills go not much beyond how to set up a basic website, out there that have no idea what you just said or how to do it… they simply pay their ten bucks a month hosting and their coding efforts don’t go beyond simple HTML with an editor
I think this is the reason many “mom and pop” website owners dismally fail the robots.txt thing, because they don’t know how to password protect directories because their html editor program doesn’t have it as an option. Maybe “how to password protect a directory” should be an upcoming article!
February 14th, 2008 at 7:42 am
[...] What is your robots.txt file telling your competitors about you? [...]
February 14th, 2008 at 11:11 am
try noindex noarchive and nofollow.. metas for exclusion
also on the example above by jim matheson.. be sure and put up an index page on the /noindex/ directory.. or chmod your directory where it won’t be read..
February 15th, 2008 at 3:16 am
[...] What is your robots.txt file telling your competitors about you? - Jennifer Slegg [...]
February 25th, 2008 at 10:43 am
[...] What is your robots.txt file telling your competitors about you? [...]
May 5th, 2008 at 12:32 pm
Nice article! Its actually surprising how many people rely solely on their robots.txt file to help protect very vital information in the systems from being accessed.
Bear in mind that not all crawlers and bots respect the robots.txt files on web servers. In fact, I’ve seen a worrying number of sites where a Chinese originated crawler has been hammering sites disregarding the instructions in those robots.txt file.
Webmasters and site owners should remember that security is not a destination, but an ever ending journey.
Evans