Control how Google indexes your website with a simple robots.txt file

on 16 March 2018, 10:06:04 AM

A robots.txt file allows you to control how web crawlers, like Google, Bing , etc. access parts of your website. The file iteslf sits in the root folder of your website and adheres to the Robots Exclusion Protocol. This protocol allow you to control access to your website by URL or by the type of crawler. 

Not all crawlers/spiders follow this protocol to the letter and some ignore it completely; e.g. spambots, malware etc. 

How robots.txt file works.

Google is indexing your website and gets to the URL www.yoursite.com/news/ Just before loading this page th spider/crawler looks for www.yoursite.com/robots.txt and finds your robots file. The format may look like the following; 

Blocking All Access

User-agent: *

Disallow: /

The above placed inside a robotx.txt file will instruct all crawlers that they should not crawl any pages on the website.   To do the opposite and allow all spiders to crawl al pages on your website your robots.txt file would look like the following; 

Allow Full Access

User-agent: *

Allow: /

Additional exmples of robots.txt file can be found below. 

Block One Folder

User-agent: *

Allow: /folder/

Block One Page

User-agent: *

Allow: /news.html





More about robots.txt 

Questions?

If you have any questions about Factory Flow or any of our integration products please contact us using the inline chat or phone us by clicking below.

Phone Us

we specialise in rapid prototype development on the desktop, mobile or tablet