How can I have my site removed from the Wayback Machine?
The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled in the future as well as exclude any historical pages from the Wayback Machine.
Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy.
Here are directions on how to automatically exclude your site. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org.
If you are emailing to ask that your website not be archived, please note that you’ll need to include the url (web address) in the text of your message.
7 Responses to How can I have my site removed from the Wayback Machine?
Leave a Reply to Lawrence Cancel reply
FAQs
- How can I find all the pages from one site?
- How can I view a page without the Wayback code in it?
- Can I get a copy of this web page?
- The page I want redirects now – how can I see the old versions?
- How should I report issues?
- Can I get just one page archived?
- What’s the difference between the classic Wayback Machine and the new BETA test version?
- What is the Wayback Machine?
- What are the known issues with the BETA test version?
- How can I have my site removed from the Wayback Machine?
- My site’s not archived! How can I add it?
Forbidden
You don’t have permission to access / on this server.
Above is the error I get when trying to access ANY page of sarahpalin.com.
Can you fix?
Thanks
Hi dhs,
That appears to be what we actually archived from that domain when it was crawled. The current, live site at http://sarahpalin.com/ just reads, “This page intentionally left blank.” so it doesn’t seem likely there was ever much real content there.
Thanks,
Wayback team
Dear Wayback Team,
The title of this article is “How can I have my site removed from the Wayback Machine?” but it provides only instructions on excluding your pages *from now on*.
What about *pages we published in the past*, which we do not want archived? Even if there’s nothing embarrassing or hazardous about them at all, we may want them completely e-shredded.
I suspect this wish is what brings most visitors to this page. Please set up a page with clear instructions on doing just that.
Hi Lawrence,
Placing a robots.txt file on your site does exclude historically collected pages from the Wayback Machine. From above:
If you are unable to do this, please email info@archive.org.
Thanks,
Alexis
Lawrence’s point still stands — the robots.txt file only helps for NEXT TIME you bother to crawl the site. In some sites’ cases, that can be months. What if we want our sites removed right now? That should be an automated process we can perform, not something we have to e-mail about, nor wait until the next crawl.
Even a, ‘check my site now’ button would help so that the robots.txt can be picked up for the new sites and wipe out old content.
Hi dnm,
I think perhaps you’ve misunderstood how the robots.txt block works. When someone tries to view a site through the Wayback Machine, *before* we display the archived site to them we first go to the live web site and check the live robots.txt file to see whether it tells us to block showing the site. The “answer” from your live site may be saved for up to 24 hours, so changing your robots.txt file isn’t instantaneous but it should take effect for blocking content from the Wayback within about 24 hours.
Thanks,
Alexis (IA)
Brilliant answer, on the fly checks for robots.
nice design too.
i tested it, and it works.