Poster:
|
Jeff Kaplan |
Date:
|
October 16, 2011 02:28:08pm |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
it is a known issue that is currently being worked on. although occasionally this means that the external server (outside the ardhive.org system) that has the robots.txt file is not responding in which case there is nothing we can do.
Poster:
|
Manfred Wassmann |
Date:
|
November 05, 2011 11:40:26am |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
Does it mean, you can not display pages if the site has been shut down? IMHO that would make the archive rather useless.
For example apparently Alcatel-Lucent, the current owner of the Bell Labs, shut down the site bell-labs.com on October 31st, about three weeks after Dennis Ritchie died, just when I was browsing a list of historic man pages of Unix version 7 at plan9.bell-labs.com.
I thought it was a temporary failure and tried to access the pages today, but had to recognize even the DNS records for plan9.bell-labs.com have been deleted. So I tried to find the page in the archive but I only get the dreaded robots.txt error message :-(
Poster:
|
AndyFromHarvardLibraries |
Date:
|
November 28, 2011 09:00:11am |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
"Does it mean, you can not display pages if the site has been shut down? IMHO that would make the archive rather useless."
Seconding that.
robots.txt isn't considered to be a security measure, and shouldn't be treated like one. It's intent is to control search results, not to control access to content. If there's something up that someone doesn't want up, they should not have put it on their website to begin with.
If you want to not archive certain site contents based on the robots.txt, that's great. If i means you can't access a site while it's down, that's lame. IA--Whoever pressured you into doing this is wrong, needs a reality check, and you shouldn't have kowtowed to them... unless "Universal access to all knowledge" suddenly doesn't include websites that aren't up anymore. What's going on? Are you getting lots of DMCA takedown requests that you can't deal with?
Poster:
|
AndyFromHarvardLibraries |
Date:
|
November 28, 2011 02:07:42pm |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
This change? Works for me now. Bug?
Poster:
|
Donovan K. Loucks |
Date:
|
October 16, 2011 08:52:05pm |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
Thanks for the info, Jeff. The site I'm interested in is no longer up, so there won't be any way to retrieve the robots.txt file from it. Does that mean I won't be able to examine the archived version of the page? Or are you referring to a different external server?
Poster:
|
Jeff Kaplan |
Date:
|
October 16, 2011 10:19:33pm |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
That domain appear to be registered through 12/12/2011 according to InterNIC whois. I am seeing that the remote host for the domain is not responding. Any captures we have will not be available until a robots.txt can be retrieved and it is not blocking crawling by the Wayback Machine.
Poster:
|
Donovan K. Loucks |
Date:
|
October 17, 2011 08:52:00am |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
Jeff,
The problem is that the co-hosting company no longer performs co-hosting:
http://dyn.com/everyeditdns-discontinued/Does that mean the pages won't be able to be retrieved? The guy who owns the domain can no longer access the files (and apparently doesn't have backups of them) and would like to resurrect the site elsewhere.
Donovan
Poster:
|
ladynred |
Date:
|
October 22, 2011 09:21:59am |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
This error continues. I was able to successfully view archived sites earlier this week, and today when I try to check those same URLs, this robots.txt error comes up for ANY URL I put in.
I use the archive for research, is this something you can fix??
Poster:
|
LadyCallie |
Date:
|
October 30, 2011 06:57:14pm |
Forum:
|
web
|
Subject:
|
Re: We were unable to get the robots.txt document to display this page. |
I'm having the same problem while looking for a previously archived prodigy.net site.
Any idea if this will be fixed?