If you wish to exclude your entire website from Google's index, you can place a file at the root of your server called robots.txt. This is the standard protocol that most web crawlers observe for excluding a web server or directory from an index. More information on robots.txt is available here: http://www.robotstxt.org/wc/norobots.html. Please note that Googlebot does not interpret a 401/403 response ("Unauthorized"/"Forbidden") to a robots.txt fetch as a request not to crawl any pages on the site.
To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:
User-agent: *
Disallow: / To remove your site from Google only and prevent just Googlebot from crawling your site in the future, place the following robots.txt file in your server root:
User-agent: Googlebot
Disallow: / Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt): User-agent: *
Allow: / For the https protocol (https://yourserver.com/robots.txt): User-agent: *
Disallow: /
|
No comments:
Post a Comment