Universal Access To All Knowledge
Home Donate | Forums | FAQs | Contributions | Terms, Privacy, & Copyright | Contact | Volunteer Positions | Jobs | Bios
Search: Advanced Search
Anonymous User (login or join us) Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: stbalbach Date: April 28, 2008 11:32:31am
Forum: texts Subject: Re: Is the archive "dark web" to Google

"whether Google indexes text inside the Internet Archive?"

No. Google has its own Book Scanning service called books.google.com and they see themselves in competition with IA (more accurately the Open Content Alliance which has Microsoft as a member) so they don't want Google searches going to competitors service/products. This is (one reason) why so many in the open source community are not happy with Googles book scanning efforts, they want to "own" (in effect) the search space for books.

Partly as a counter to this (search engines creating monopolies of data), the OCA is creating the Open Library (openlibrary.org) whose vision is to archive/link-to every scanned book no matter what project scanned it under a single interface. Actually, I don't know for sure if books.google.com will be included in Open Library but assume it would be.

Reply to this post
Reply [edit]

Poster: Andy1342 Date: May 03, 2008 04:27:46am
Forum: texts Subject: Re: Is the archive "dark web" to Google

Many thanks - very helpful.
Andy

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staff brewster Date: May 04, 2008 07:39:48am
Forum: texts Subject: Re: Is the archive "dark web" to Google


All search engines are welcome to and do index all the books in the Internet Archive. All the metadata is harvestable in multiple ways.

The text inside the books are also available for indexing in most circumstances. The digital books sponsored by Microsoft, however, come with the "no commercial services" restriction, so the inside text is not available to commercial robot crawling.

Bulk access to the books is encouraged.

I hope this is clear.

-brewster

Reply to this post
Reply [edit]

Poster: stbalbach Date: May 04, 2008 10:53:07am
Forum: texts Subject: Re: Is the archive "dark web" to Google

I did some tests and found Google does not index the full text of books on IA, at least the ones I tested. Metadata yes, but not the full text. Even for non-Microsoft books. My tests are only a few and anecdotal and may not be representative but just passing on my findings, I can post more detailed examples if anyone wants to look into it.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staff brewster Date: May 04, 2008 02:37:51pm
Forum: texts Subject: Re: Is the archive "dark web" to Google

that is odd since we put a link on the book display page specifically because a google representative kept telling the press they had a hard time crawling these books.

hopefully they get better at it.

-brewster

Terms of Use (10 Mar 2001)