Implementation Stories
Ralph LeVan, April 2005
Pears is easy to use and there are lots of record handlers. SRW/U makes
building interfaces trivial. So, I decided to provide searching for the
SiteSearch documentation.
The database is built with Pears and a spidering record handler that
returns records with the URL of the page in one field, the title from
the page in another and the body in yet another. The database description
made a phrase index for the URL, a phrase and keyword index for the title,
a keyword index for the body and a keyword index combining all fields.
This created a database with 724 records and 36K index terms from 371K
nips.
I exposed the database via SRW/U. The URL for it is http://alcme.oclc.org/srw/search/SiteSearchDocumentation.
You'll get back an Explain record with a stylesheet reference. The stylesheet
renders a user interface. (It's not a very elegant interface and needs
a little work.) The '=' relation gets you adjacency searches and the
'exact' relation gets you phrase searches. Try dc.title=pears.
Now, one of the beauties of SRU is that you generate good URL's. So,
here's that Pears search embedded in a URL:
http://alcme.oclc.org/srw/search/SiteSearchDocumentation?query=dc.title=
pears&version=1.1&maximumRecords=10
There's a parameter on the search screen that controls how many records
you get back. My default is 1.
All this code is checked into my CVS repository, if you want to pull
it yourself. Otherwise, I'll make a new Pears jar soon.
Janifer Gattenby, October
2005
The DBNG (Digital Bibliography of Dutch History - Digitale Bibliographie
voor de Nederlandse Geschiedenis) is a new database realized by a joint
project between OCLC PICA and the Koninklijke Bibliotheek (KB), the Dutch
Royal Library. The database was formed by combining 4 separate databases,
de-duplicating them and harmonising names, classifications and subject
headings. It now has more than 200,000 titles covering books, periodical
titles, articles and some abstracts and summaries. Whilst OCLC PICA created
the database using PSI (Pica Search and Index Engine) that is SRU enabled,
the KB developed a web based user interface that includes an SRU client.
Via SRU, there are more than 20 search access points and search limiters,
most of them also enabled for scanning. The searching is rich, including
date range searching, keyword truncation, boolean and proximity searching,
sorting by year of publication and relevance.
Result data can be returned in one of 6 XML schemas, short or full Dublin
Core (DC), UNIMARC or PicaMARC.
The database is available at: http://www.dbng.nl/ .
|