電腦學習史: Remove your site from Search EngineS

Saturday, September 10, 2005

Remove your site from Search EngineS

Remove your entire website
If you wish to exclude your entire website from Google's index, you can place a file at the root of your server called robots.txt. This is the standard protocol that most web crawlers observe for excluding a web server or directory from an index. More information on robots.txt is available here: http://www.robotstxt.org/wc/norobots.html. Please note that Googlebot does not interpret a 401/403 response ("Unauthorized"/"Forbidden") to a robots.txt fetch as a request not to crawl any pages on the site.
To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:
User-agent: * Disallow: / To remove your site from Google only and prevent just Googlebot from crawling your site in the future, place the following robots.txt file in your server root:
User-agent: Googlebot Disallow: / Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt): User-agent: * Allow: / For the https protocol (https://yourserver.com/robots.txt): User-agent: * Disallow: /

Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, the webmaster must first create and place a robots.txt file on the site in question.
Google will continue to exclude your site or directories from successive crawls if the robots.txt file exists in the web server root. If you do not have access to the root level of your server, you may place a robots.txt file at the same level as the files you want to remove. Doing this and submitting via the automatic URL removal system will cause a temporary, 180 day removal of your site from the Google index, regardless of whether you remove the robots.txt file after processing your request. (Keeping the robots.txt file at the same level would require you to return to the URL removal system every 180 days to reissue the removal.)

Saturday, September 10, 2005

Remove your site from Search EngineS

No comments: