Comments for Web Archiving at archive.org http://iawebarchiving.wordpress.com Internet Archive Web Team Tue, 28 Jun 2011 22:03:50 +0000 hourly 1 http://wordpress.com/ Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by adrolli http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-709 Tue, 28 Jun 2011 22:03:50 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-709 I found no entries in 2010 and 2011 for most of our (well known) websites, in 2011 there are no entries (even Microsoft.com or Apple.com).

Is this a temporary issue or the end of the waybackmachine?

Regards,
Alf

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by Vitaliy Kuzmin http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-708 Tue, 28 Jun 2011 11:02:18 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-708 How can I force Wayback Machine to archive entire site and all files on it?

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by gokitalo http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-706 Mon, 20 Jun 2011 09:48:44 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-706 I do have a fairly urgent one. I’m not sure how long you intend to have the classic interface around, but there are certain pages and sites that only seem to exist in the classic interface. This message board site, for example:

http://pub17.ezboard.com/bschoolforgiftedyoungsters

Which also went by the URL:

http://p082.ezboard.com/bschoolforgiftedyoungsters

When I type these two URLs in the classic interface, archived versions of the site appear, as you can see below:
http://classic-web.archive.org/web/*/http://pub17.ezboard.com/bschoolforgiftedyoungsters

http://classic-web.archive.org/web/*/http://p082.ezboard.com/bschoolforgiftedyoungsters

When I use these same URL with the current interface, however, no archived versions of the site appear. And while the site has changed URLs since then:

http://schoolforgiftedyoungsters.yuku.com

… both the classic and current versions of the interface say that no versions of the page have been archived. Frankly, I’m worried that if the classic interface is removed, all the older versions of this site will disappear. While the message board does continue to exist, a lot of old threads were deleted when EZBoard was hacked in 2005. However, a lot of these deleted threads still exist in the classic interface of the Internet Archive.

If the classic version of the interface is removed, however… I’m worried that all these old message board threads may be lost for good. This is a roleplaying/writing board, and I don’t think anyone who posted there wants to see some of their best work deleted.

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by kevinff http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-704 Thu, 09 Jun 2011 17:28:35 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-704 Hello,
I’ve searched everywhere but didn’t get any decent information:
We are using whitelisting to whitelist crawlers, eg: for googlebot we verify that the reverse address ends with google.com and that the reverse address resolves back to the IP. Then we can prevent the site from throwing captcha’s and other stuff at googlebot, bingbot, baidu, yandex and others.. While preventing fake bots from passing through our anti-gathering protection.

However it seems that Archive.org/Alexa are using various IPs from Amazon to collect data..
Is there a list of IP that we can whitelist? Is there any other way to be sure that some IPs are from Archive.org/Alexa? (i’m not talking only about the user agent, as we’ve found many fake Googlebots).

Thanks for the help

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by mariko http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-703 Sun, 01 May 2011 14:01:33 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-703 I see the Advanced Search is gone- will that be back? I’m interested in searching for text rather than URLs.

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by glennp000 http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-702 Fri, 22 Apr 2011 20:07:06 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-702 I didn’t see any advanced filters by date range on your new interface. (I’m interested in entries in the last 12 months, but not just the latest) And if you discussed this in your FAQs, I wouldn’t know, because the FAQ link doesn’t go anywhere except redirect to home.

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by siplushwguy http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-701 Mon, 11 Apr 2011 08:34:19 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-701 Hello, Internet Archive!

Could you allow browsing archived versions of http://halflifehq.com/ ? Robots.txt blocking you was placed on this site when it got closed and its domain got parked (most domain parking services block crawling to make parked domains unsearchable), and it contained very valuable information for Half-Life community (the most valuable content is videos such as bullchicken360.avi, he360.avi, alieng.avi, alienslave.avi, burnacle.avi, tentacle.avi and xenome.avi) before closing.

Original owners did not block you, Internet Archive (you can check it by browsing archived versions of halflifehq.com/robots.txt, it was 404 in the year 2002, archive of which I want to see.)

And, if you can’t allow browsing the entire site, could you just send us archived versions of the following files?:
http://www.halflifehq.com/files/downloads/avi/bullchicken360.avi !VERY IMPORTANT
http://www.halflifehq.com/files/downloads/avi/he360.avi !VERY IMPORTANT
http://www.halflifehq.com/files/downloads/avi/xenome.avi
http://www.halflifehq.com/files/downloads/avi/alieng.avi
http://www.halflifehq.com/files/downloads/avi/alienslave.avi
http://www.halflifehq.com/files/downloads/avi/burnacle.avi or /files/downloads/avi/barnacle.avi
http://www.halflifehq.com/files/downloads/avi/headcrab.avi
http://www.halflifehq.com/files/downloads/avi/tentacle.avi
And maybe the following too (if they’re not from HL: Further Data):
http://www.halflifehq.com/files/downloads/mp3/half-life1.mp3
http://www.halflifehq.com/files/downloads/mp3/half-life2.mp3

Thanks in advance

]]>
Comment on Wayback Machine & Web Archiving Open Thread, April 2011 by yahudeejay http://iawebarchiving.wordpress.com/2011/04/07/wayback-machine-web-archiving-open-thread-april-2011/#comment-700 Fri, 08 Apr 2011 06:51:52 +0000 http://iawebarchiving.wordpress.com/?p=181#comment-700 I’m still interested and most of all interested- WHEN CHANCE TO SEE WAYBACK MACHINE RESULTS FOR http://www.djsportal.com – JUNE – SEPTEMBER 2009

]]>
Comment on Updated Wayback Machine in Beta Testing by inkdroid › xhtml, wayback http://iawebarchiving.wordpress.com/2011/01/24/updated-wayback-machine-in-beta-testing/#comment-699 Wed, 09 Mar 2011 23:59:26 +0000 http://iawebarchiving.wordpress.com/?p=171#comment-699 [...] Internet Archive gave the Wayback Machine a facelift back in January. It actually looks really nice, but I noticed something kinda odd. I was looking [...]

]]>
Comment on Updated Wayback Machine in Beta Testing by edsu http://iawebarchiving.wordpress.com/2011/01/24/updated-wayback-machine-in-beta-testing/#comment-698 Wed, 09 Mar 2011 23:46:29 +0000 http://iawebarchiving.wordpress.com/?p=171#comment-698 I ran into some problems with archives XHTML which I documented here. I’d be interested to hear what you think.

]]>