How to Properly Format a robots.txt File

How to Write a robots.txt File

There’s been a metamorphosis in Search Engine Optimization over the past few years. We’ve gone from keywords to content and from rankings to traffic, however, among all the changes, some things remain the same. You just can’t forget the basics, like how to write a good robots.txt file. Since this is such an essential element to a website, they can completely block your site from the Search Engine Results Pages of both Google and Bing if written improperly. If this doesn’t sound very appealing, you must learn how to properly format a robots.txt file in order to avoid crippling your site’s traffic.

So you wanna create a robots.txt file? That’s great!

Here’s the most basic example that you’ll ever see. But be forewarned, it effectively does nothing, nada, zip. This is the robots.txt equivalent of the ol’ <meta name=”robots” content=”index, follow” /> uselessness. You’re effectively telling search engines to crawl your site. Guess what? THAT’S THEIR JOB! But, for for the sake of learning, let’s take a look anyway.

User-agent: *
Allow: /

What is that robots.txt file saying? Well, it’s telling search engines “come hither, you’re invited, crawl my site!” But what do these short lines of code mean?

User-agent: *

The User-agent line is where you can name the search engine bots, or crawlers, one-by-one. In other words, here you are highlighting which search engine’s bots you are addressing. The biggest ones are obviously googlebot and bingbot. There’s another, msnbot, that’s attached to Bing as well. However, to communicate with all crawlers, rather than individually listing hundreds of search engines, you can simply put an asterisk on the User-agent line. This addresses all crawlers on the internet.

Allow: /

So what is this noise? With the “Allow” command, you are allowing whatever bots you addressed in the User-agent line to crawl your entire site. You can also change this command to “Disallow” to keep those bots from crawling it.

Plus, you can use these commands on specific folders, or even files, on your domain. If you change the command to the name of a folder, say:

User-agent: bingbot
Disallow: / 
Allow: /example-folder/

What you’ve done here is effectively blocked Bing from the entire site “Disallow: /except for the “example-folder” folder – “Allow: /example-folder/

You can also do this with types of files as well. Let’s say you don’t want to index any PDFs to appear in search results. You can use the asterisk again – except you’re going to follow it with a dollar sign.

Disallow: /*.pdf$

There is another command to consider. If your site is getting too much crawler traffic, you can tell them to take it easy with the Crawl-delay parameter. Here, you can set whichever crawl rate setting you wish for your site.

i.e.

User-agent: msnbot
Crawl-delay: 1

This means that you want your site to be crawled slowly. The larger the Crawl-delay number, the slower your site will be crawled.

Crawl-delay setting

Index refresh speed

No crawl delay set

Normal

1

Slow

5

Very slow

10

Extremely slow

 

Good bots, bad bots

As you format a robots.txt file, keep in mind that there are a few bad bots that should be avoided like the plague. Here’s my list of bad bots (Decepticons?). We apologize if any bots are insulted, but man, keep your spam and crawlers to yourself.

  • Baiduspider
  • YandexBot
  • MJ12Bot
  • rogerbot
  • Ahrefsbot
  • semaltspider

So, what happens if you find a crawler, like those above, that you don’t like? How do you block it? For this, you would use the ol’ trusty Disallow command below.

User-agent: Baiduspider
Disallow: /

Plus, you should also write the sitemap location in your robots.txt file. While Google and Bing Webmaster Tools allow you to upload the location of the sitemap.xml file, there are other search engines out there, believe it or not, and the easier you make it on them, the higher you climb in the rankings.

Sitemap: http://www.yourwebsite.com/sitemap.xml

So, let’s see all of this in unison. If you’re a beginning SEO or Webmaster, feel free to copy and use it for your own website.

User-agent: *
Allow: /

User-agent: googlebot
Allow: /
Disallow: /Bad-Folder/
Allow: /Bad-Folder/Good-File.html

User-agent: bingbot
Allow: /
Disallow: /Good-Folder/*.pdf$
Disallow: /Bad-Folder/
Crawl-delay: 1

User-agent: Baiduspider
Disallow: /

User-agent: YandexBot
Disallow: /

Sitemap: http://www.yourwebsite.com/sitemap.xml
Video Sitemap: http://www.yourwebsite.com/video-sitemap.xml

Good luck out there. If you have any SEO questions, or want to add additional crawlers that should be avoided like the plague, leave us a comment below!

Click to Tweet

 

Dan Patrick is the Digital Media Group Manager at Dealer Product Services. He also served as a SEO Analyst and Copywriter at Walgreens, CouponCabin, and LocalLaunch! Prior to his career in Search Engine Optimization, Dan was an award-winning newspaper and magazine journalist in the Chicagoland area. In his personal life, Dan is a major automotive enthusiast, having attended major events including the Rolex 24 at Daytona, 12 Hours of Sebring and the Indianapolis 500. Thus far, he has attended major events for Indycar, NASCAR, NHRA, TUSCC, ALMS, GRAND-AM, TRANS-AM, and more.

Twitter LinkedIn  

Dan Patrick

Dani,

Of course, but there are certain meta tags that are useful and many that are not. Depending on your level of knowledge, you need to have Meta Description so YOU dictate what your search engine listings say – not the search engines as they can sometimes spit out nonsensical search engine listings. Meta Descriptions won’t help a bit with rankings, but it will help keep you in the driver’s seat of your website/brand. Also, you need Facebook Admin and Open Graph tags, Twitter Cards, and Publisher tags for Facebook, Twitter and Google+, respectively. As you were alluding to, “robots” content=”noodp,noydir” is a good standby tag as it blocks the Open Directory Project and Yahoo Directory from crawling your site.

You’ll also need a sitemap.xml file as well.

These are the bare basics today.

Meta keywords are long obsolete and they can only hurt you if you use keywords that are not relevant to the content on your page. There are some other Meta Tags that I use quite a bit, but the ones I covered above are what you truly need today. Also, as you

And, as always – track, track, track on Google and Bing Webmaster Tools. Experimentation is an SEO’s best friend!

Your email address will not be published. Required fields are marked *