Poster:
|
stbalbach |
Date:
|
October 18, 2008 08:47:54am |
Forum:
|
texts
|
Subject:
|
On searching and Internet Archive |
Internet Archive has a very powerful and complex searching apparatus. I've been using it for years and still keep discovering new ways of tweaking it to find the most complete results. The IA database is not very consistent, due to the nature of the project, so it requires some very powerful searching to find books in the archive.
For example, one of the most basic functions of a library is to find all works by an author. Here is the search needed to find all the books by "Jerome K. Jerome":
mediatype:(texts) AND (subject:"Jerome, Jerome Klapka, 1859-1927" OR subject:"Jerome, Jerome K. (Jerome Klapka), 1859-1927" OR creator:"Jerome, Jerome K. (Jerome Klapka), 1859-1927" OR creator:"Jerome, Jerome Klapka, 1859-1927" OR creator:"Jerome Klapka Jerome" OR creator:"Jerome K. Jerome" OR title:"Jerome Klapka Jerome" OR title:"Jerome K. Jerome" OR description:"Jerome Klapka Jerome" OR description:"Jerome K. Jerome")
Which translates to a URL of:
http://www.archive.org/search.php?query=mediatype%3A%28texts%29%20-contributor%3Agutenberg%20AND%20%28subject%3A%22Jerome%2C%20Jerome%20Klapka%2C%201859-1927%22%20OR%20subject%3A%22Jerome%2C%20Jerome%20K.%20%28Jerome%20Klapka%29%2C%201859-1927%22%20OR%20creator%3A%22Jerome%2C%20Jerome%20K.%20%28Jerome%20Klapka%29%2C%201859-1927%22%20OR%20creator%3A%22Jerome%2C%20Jerome%20Klapka%2C%201859-1927%22%20OR%20creator%3A%22Jerome%20Klapka%20Jerome%22%20OR%20creator%3A%22Jerome%20K.%20Jerome%22%20OR%20title%3A%22Jerome%20Klapka%20Jerome%22%20OR%20title%3A%22Jerome%20K.%20Jerome%22%20OR%20description%3A%22Jerome%20Klapka%20Jerome%22%20OR%20description%3A%22Jerome%20K.%20Jerome%22%29(I'm not even entirely confident this gets them all)
As you can see, 99% of users will never figure this out, or take the time to enter it in. What I'm wondering is if there is any way to automate the searching? It could be possible to "search for all works by Jerome K. Jerome" and have this search string, or a number of variants, automatically created by a script which the user can then click on. I realize it's more complex because there are some authors with a "Sir", some with no middle name, etc.. it's just a question of figuring out the various permutations based on user input and providing them with some pre-canned search strings to try - click on an authors name, and a page comes up with some recommended search strings to try.
I've already written some simple unix scripts to create search strings with various options like exact match, fuzzy match, "sir" or no "sir", middle name initial or expand to full middle name, etc.. I use the scripts for creating URL searches for Wikipedia author External Links sections. See the Wikipedia article for "Jerome K. Jerome" for example - it was created automatically with a script.
This post was modified by stbalbach on 2008-10-18 15:47:54