Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Jeff Kaplan Date: October 16, 2011 02:28:08pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

it is a known issue that is currently being worked on. although occasionally this means that the external server (outside the ardhive.org system) that has the robots.txt file is not responding in which case there is nothing we can do.

Reply to this post
Reply [edit]

Poster: Manfred Wassmann Date: November 05, 2011 11:40:26am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Does it mean, you can not display pages if the site has been shut down? IMHO that would make the archive rather useless.

For example apparently Alcatel-Lucent, the current owner of the Bell Labs, shut down the site bell-labs.com on October 31st, about three weeks after Dennis Ritchie died, just when I was browsing a list of historic man pages of Unix version 7 at plan9.bell-labs.com.

I thought it was a temporary failure and tried to access the pages today, but had to recognize even the DNS records for plan9.bell-labs.com have been deleted. So I tried to find the page in the archive but I only get the dreaded robots.txt error message :-(

Reply to this post
Reply [edit]

Poster: AndyFromHarvardLibraries Date: November 28, 2011 09:00:11am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

"Does it mean, you can not display pages if the site has been shut down? IMHO that would make the archive rather useless."

Seconding that.

robots.txt isn't considered to be a security measure, and shouldn't be treated like one. It's intent is to control search results, not to control access to content. If there's something up that someone doesn't want up, they should not have put it on their website to begin with.

If you want to not archive certain site contents based on the robots.txt, that's great. If i means you can't access a site while it's down, that's lame. IA--Whoever pressured you into doing this is wrong, needs a reality check, and you shouldn't have kowtowed to them... unless "Universal access to all knowledge" suddenly doesn't include websites that aren't up anymore. What's going on? Are you getting lots of DMCA takedown requests that you can't deal with?

Reply to this post
Reply [edit]

Poster: AndyFromHarvardLibraries Date: November 28, 2011 02:07:42pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

This change? Works for me now. Bug?

Reply to this post
Reply [edit]

Poster: Donovan K. Loucks Date: October 16, 2011 08:52:05pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Thanks for the info, Jeff. The site I'm interested in is no longer up, so there won't be any way to retrieve the robots.txt file from it. Does that mean I won't be able to examine the archived version of the page? Or are you referring to a different external server?

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: October 16, 2011 10:19:33pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

That domain appear to be registered through 12/12/2011 according to InterNIC whois. I am seeing that the remote host for the domain is not responding. Any captures we have will not be available until a robots.txt can be retrieved and it is not blocking crawling by the Wayback Machine.

Reply to this post
Reply [edit]

Poster: Donovan K. Loucks Date: October 17, 2011 08:52:00am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Jeff,

The problem is that the co-hosting company no longer performs co-hosting:

http://dyn.com/everyeditdns-discontinued/

Does that mean the pages won't be able to be retrieved? The guy who owns the domain can no longer access the files (and apparently doesn't have backups of them) and would like to resurrect the site elsewhere.

Donovan

Reply to this post
Reply [edit]

Poster: ladynred Date: October 22, 2011 09:21:59am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

This error continues. I was able to successfully view archived sites earlier this week, and today when I try to check those same URLs, this robots.txt error comes up for ANY URL I put in.

I use the archive for research, is this something you can fix??

Reply to this post
Reply [edit]

Poster: LadyCallie Date: October 30, 2011 06:57:14pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

I'm having the same problem while looking for a previously archived prodigy.net site.

Any idea if this will be fixed?

Poster:	Jeff Kaplan	Date:	October 16, 2011 02:28:08pm
Forum:	web	Subject:	Re: We were unable to get the robots.txt document to display this page.

Poster:	Manfred Wassmann	Date:	November 05, 2011 11:40:26am
Forum:	web	Subject:	Re: We were unable to get the robots.txt document to display this page.

Poster:	AndyFromHarvardLibraries	Date:	November 28, 2011 09:00:11am
Forum:	web	Subject:	Re: We were unable to get the robots.txt document to display this page.

Poster:	Donovan K. Loucks	Date:	October 16, 2011 08:52:05pm
Forum:	web	Subject:	Re: We were unable to get the robots.txt document to display this page.

Poster:	ladynred	Date:	October 22, 2011 09:21:59am
Forum:	web	Subject:	Re: We were unable to get the robots.txt document to display this page.

Poster:	LadyCallie	Date:	October 30, 2011 06:57:14pm
Forum:	web	Subject:	Re: We were unable to get the robots.txt document to display this page.

Reply to this post | See parent post | Go Back View Post [edit]

Poster: Jeff Kaplan Date: October 16, 2011 02:28:08pm Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: Manfred Wassmann Date: November 05, 2011 11:40:26am Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: AndyFromHarvardLibraries Date: November 28, 2011 09:00:11am Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: AndyFromHarvardLibraries Date: November 28, 2011 02:07:42pm Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: Donovan K. Loucks Date: October 16, 2011 08:52:05pm Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: Jeff Kaplan Date: October 16, 2011 10:19:33pm Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: Donovan K. Loucks Date: October 17, 2011 08:52:00am Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: ladynred Date: October 22, 2011 09:21:59am Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post Reply [edit]

Poster: LadyCallie Date: October 30, 2011 06:57:14pm Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Jeff Kaplan Date: October 16, 2011 02:28:08pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: Manfred Wassmann Date: November 05, 2011 11:40:26am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: AndyFromHarvardLibraries Date: November 28, 2011 09:00:11am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: AndyFromHarvardLibraries Date: November 28, 2011 02:07:42pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: Donovan K. Loucks Date: October 16, 2011 08:52:05pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: October 16, 2011 10:19:33pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: Donovan K. Loucks Date: October 17, 2011 08:52:00am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: ladynred Date: October 22, 2011 09:21:59am

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Reply to this post
Reply [edit]

Poster: LadyCallie Date: October 30, 2011 06:57:14pm

Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.