Every subdomain on your site should have a robots.txt file that links to a sitemap and describes any crawler restrictions.

A robots.txt file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers.

Google, “Learn about robots.txt files”

Use robots.txt files

Add a robots.txt file to every subdomain so you can specify sitemap locations and set web crawler rules. Robots.txt files are always located in the root folder with the name robots.txt. Each robots.txt file only applies to URLs with the same protocol, subdomain, domain and port as the robots.txt URL. For example, http://example.com/robots.txt would be the robots.txt URL for http://example.com but not https://example.com or http://www.example.com. Even an empty robots.txt file is useful to have for cleaning up server logs as it will reduce 404 errors from visiting bots. Keep in mind that if you use a robots.txt file to tell search bots not to visit a certain page, that page can still appear in search results if it’s linked to from another page. To hide pages from search results, use noindex meta tags instead.

Set sitemap locations

Each robots.txt file should specify sitemap file locations. Sitemap files contain a list of page URLs that you want indexed and are read by search bots. These files can also include metadata that describes when pages were last updated and how often different pages are updated to help crawlers index your site more intelligently. A sitemap location should be specified in the robots.txt file with a line such as Sitemap: http://example.com/sitemap.xml. A robots.txt file can include more than one sitemap reference.

More articles in this series

➜  This article is from our comprehensive SEO Best Practices guide.

➜  Next article in this series: Redirects

➜  Previous article in this series: Links