Every subdomain on your site should have a robots.txt file that links to a sitemap and describes any crawler restrictions.
A robots.txt file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers.
Use robots.txt files
Add a robots.txt file to every subdomain so you can specify sitemap locations and set web crawler rules. Robots.txt files are always located in the root folder with the name
robots.txt. Each robots.txt file only applies to URLs with the same protocol, subdomain, domain and port as the robots.txt URL. For example,
http://example.com/robots.txt would be the robots.txt URL for
http://example.com but not
http://www.example.com. Even an empty robots.txt file is useful to have for cleaning up server logs as it will reduce 404 errors from visiting bots. Keep in mind that if you use a robots.txt file to tell search bots not to visit a certain page, that page can still appear in search results if it’s linked to from another page. To hide pages from search results, use
noindex meta tags instead.
Set sitemap locations
Each robots.txt file should specify sitemap file locations. Sitemap files contain a list of page URLs that you want indexed and are read by search bots. These files can also include metadata that describes when pages were last updated and how often different pages are updated to help crawlers index your site more intelligently. A sitemap location should be specified in the robots.txt file with a line such as
Sitemap: http://example.com/sitemap.xml. A robots.txt file can include more than one sitemap reference.
More articles in this series
➜ This article is from our comprehensive SEO Best Practices guide.
➜ Next article in this series: Redirects
➜ Previous article in this series: Links