In this codelab, you will create a union catalogue in under 10 minutes by building a Google Custom Search Engine (CSE) that provides advanced search support for sites that use the schema.org vocabulary.
Audience: Beginner
Prerequisites: To complete this codelab, you will need a Google account.
Google has already crawled much of the web, and makes the contents of its index available through its main search engine at google.com and various other specialized entry points such as Google Scholar. However, it also offers you the ability to create your own Custom Search Engine (CSE) service on top of the same index via its Custom Search offering. Google CSE offers both free and paid tiers; in this exercise, you will use the free tier.
In this exercise, you will create a simple union catalogue by building a Google CSE over the resources of a few libraries' web sites. Then you will add more advanced search support using features that have been enabled by the use of schema.org structured data on those sites.
acorn.biblio.org/eg/opac/record/*
-- an Evergreen sitedemo.mykoha.co.nz/cgi-bin/koha/opac-detail.pl
-- a Koha sitefind.senatehouselibrary.ac.uk/Record/*
-- a VuFind siterobots.txt
file to disallow search engines from indexing
their site, which prevents us from being able to include their contents
in a cross-system search, and generally limits discovery of their
resources outside of the catalog itself.
The Google CSE supported refining search results via structured data in web pages before schema.org was released in 2011. You can create filters for search results using pagemaps based on the structured data found in the indexed web pages, as well as via more complex approaches such as annotated sitemaps or HTTP header contents. In this exercise, you will add the ability to filter your search results via structured data pagemaps.
more:pagemap:type:value
, where
type
represents the structured data type, and
value
represents the optional value you can use
to further filter search results.
more:pagemap:book
in the Optional
word(s) field. Click OK to save the new
refinement. The window closes.
more:pagemap:map
more:pagemap:musicalbum
more:pagemap:creativework
At the top of your search results, you should now see a set of tab names that match the names your newly added filters, along with the tab name All for results with no filters applied. Try switching filters to see if you can identify maps, music albums, or books amongst the general search results.
In this exercise, you learned how to create a simple Google Custom Search Engine that can search across arbitrary web sites and create filters based on the structured data they contain.
Dan Scott is a systems librarian at Laurentian University.
This work
is licensed under a Creative
Commons Attribution-ShareAlike 4.0 International License.