The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled in the future as well as exclude any historical pages from the Wayback Machine.

Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy.

Here are directions on how to automatically exclude your site. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org.

If you are emailing to ask that your website not be archived, please note that you’ll need to include the url (web address) in the text of your message.

 

7 Responses to How can I have my site removed from the Wayback Machine?

  1. dhs says:

    Forbidden

    You don’t have permission to access / on this server.

    Above is the error I get when trying to access ANY page of sarahpalin.com.
    Can you fix?

    Thanks

  2. wayback says:

    Hi dhs,

    That appears to be what we actually archived from that domain when it was crawled. The current, live site at http://sarahpalin.com/ just reads, “This page intentionally left blank.” so it doesn’t seem likely there was ever much real content there.

    Thanks,
    Wayback team

  3. Lawrence says:

    Dear Wayback Team,

    The title of this article is “How can I have my site removed from the Wayback Machine?” but it provides only instructions on excluding your pages *from now on*.

    What about *pages we published in the past*, which we do not want archived? Even if there’s nothing embarrassing or hazardous about them at all, we may want them completely e-shredded.

    I suspect this wish is what brings most visitors to this page. Please set up a page with clear instructions on doing just that.

    • wayback says:

      Hi Lawrence,

      Placing a robots.txt file on your site does exclude historically collected pages from the Wayback Machine. From above:

      By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled in the future as well as exclude any historical pages from the Wayback Machine.

      If you are unable to do this, please email info@archive.org.

      Thanks,
      Alexis

  4. dnm says:

    Lawrence’s point still stands — the robots.txt file only helps for NEXT TIME you bother to crawl the site. In some sites’ cases, that can be months. What if we want our sites removed right now? That should be an automated process we can perform, not something we have to e-mail about, nor wait until the next crawl.

    Even a, ‘check my site now’ button would help so that the robots.txt can be picked up for the new sites and wipe out old content.

    • wayback says:

      Hi dnm,
      I think perhaps you’ve misunderstood how the robots.txt block works. When someone tries to view a site through the Wayback Machine, *before* we display the archived site to them we first go to the live web site and check the live robots.txt file to see whether it tells us to block showing the site. The “answer” from your live site may be saved for up to 24 hours, so changing your robots.txt file isn’t instantaneous but it should take effect for blocking content from the Wayback within about 24 hours.
      Thanks,
      Alexis (IA)

  5. Ichabod Mudd says:

    Brilliant answer, on the fly checks for robots.
    nice design too.
    i tested it, and it works.

Leave a Reply to dnm Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.