Poster:
|
randomdestructn |
Date:
|
September 26, 2012 03:27:18pm |
Forum:
|
web
|
Subject:
|
Re: Domain resellers blocking waybackmachine |
I just created an account to reply, as I found your post while googling a similar problem.
I just went to load an old copy of a website of mine, only to find out that the new owner of the domain has retroactively blocked access to the wayback machine.
I understand an update of robots.txt applying to all future scrapes, or even going back a few months. But how can a new owner of a domain block pages that were published more than a decade before they took ownership?
I really hope a solution is found, as I feel the current policy will greatly degrade the usefulness of the wayback machine as time goes on.
Poster:
|
blackduckhistorian |
Date:
|
September 27, 2012 05:06:16am |
Forum:
|
web
|
Subject:
|
Re: Domain resellers blocking waybackmachine |
Thank you for replying, it reassures me that I am not alone in this! I only created this account to make this situation public here.
It is the retroactive policy of wayback that is the problem, which sounds good in theory, but means that basically EVERY website that lapses in ownership will disappear from the archive. Effectively, wayback becomes just a temporary archive, which I am sure was not the vision when the project commenced. For example, on the FAQ page they say:
Can I link to old pages on the Wayback Machine?
Yes! The Wayback Machine is built so that it can be used and referenced. If you find an archived page that you would like to reference on your Web page or in an article, you can copy the URL.
What they don't say, or highlight, is that as soon as someone else buys that domain, the content (and the unique archive.org URLs) will be likely to be gone, and gone forever. So what is even the point of the project, archiving all of this, if it is only a temporary repository?
It is now fairly standard that when a domain name lapses, one of these domain name resellers purchases it, and installs the robots.txt to block archive.org. Therefore the archive is only secure, for that particular content, whilst that particular owner has it. Whether a new owner, or domain name reseller, all previous content is likely to disappear forever.
I really hope someone from archive.org is aware of this situation.
Had I known what was going to happen, I would have saved offline copies of all the relevant pages - it was a shock to discover the content gone. The original author and I are trying to recreate most of them from scans and other sources, it would have been so much easier to copy/paste all the text pages. I emailed archive.org about recovery from specific URLs, and they did not reply.