Build an RDFa with schema.org search engine codelab: Google CSE

By Dan Scott,

About this codelab

In this codelab, you will create a union catalogue in under 10 minutes by building a Google Custom Search Engine (CSE) that provides advanced search support for sites that use the schema.org vocabulary.

Audience: Beginner

Prerequisites: To complete this codelab, you will need a Google account.

Creating a Google Custom Search Engine

Google has already crawled much of the web, and makes the contents of its index available through its main search engine at google.com and various other specialized entry points such as Google Scholar. However, it also offers you the ability to create your own Custom Search Engine (CSE) service on top of the same index via its Custom Search offering. Google CSE offers both free and paid tiers; in this exercise, you will use the free tier.

In this exercise, you will create a simple union catalogue by building a Google CSE over the resources of a few libraries' web sites. Then you will add more advanced search support using features that have been enabled by the use of schema.org structured data on those sites.

Create a basic Google CSE

  1. Open Google Custom Search Engine in your web browser and log into your Google account.
  2. If you have never created or been added as a maintainer of a CSE, click Create a custom search engine. Otherwise, click New search engine.
  3. In the Sites to search text field, add the following URLs:
    • acorn.biblio.org/eg/opac/record/* -- an Evergreen site
    • demo.mykoha.co.nz/cgi-bin/koha/opac-detail.pl -- a Koha site
    • find.senatehouselibrary.ac.uk/Record/* -- a VuFind site
    As you add each URL, a new text field appears.
    Aside: sadly, many catalogs use a restrictive robots.txt file to disallow search engines from indexing their site, which prevents us from being able to include their contents in a cross-system search, and generally limits discovery of their resources outside of the catalog itself.
  4. Edit the Name of the search engine field to your taste.
  5. Click Create to create your CSE. The Congratulations! page opens.
  6. Click Public URL and test your CSE. Use broad search terms like cars or worship to trigger results across the sites.

Add filters to refine results by structured data

The Google CSE supported refining search results via structured data in web pages before schema.org was released in 2011. You can create filters for search results using pagemaps based on the structured data found in the indexed web pages, as well as via more complex approaches such as annotated sitemaps or HTTP header contents. In this exercise, you will add the ability to filter your search results via structured data pagemaps.

  1. Open the Google Structured Data Testing Tool in a browser.
  2. In the URL field, enter the URL for one of the search results from your broad search tests and click Preview. The Google Search Results tab displays a simulation of what might be shown in search results, along with a hierarchical view of any structured data found on the page.
  3. Click Google Custom Search and scroll down to display the Structured data for filtering search results section of the page. This data follows the general pattern of more:pagemap:type:value, where type represents the structured data type, and value represents the optional value you can use to further filter search results.
  4. Return to the Google CSE page and edit your search engine. Click Search Features in the left-hand column and click the Refinements tab to begin creating filters.
  5. Click Add. The Add Refinement window opens.
  6. Enter the name Book in the Refinement name field, and more:pagemap:book in the Optional word(s) field. Click OK to save the new refinement. The window closes.
  7. Add two more filters for the following name / value pairs:
    • Map :: more:pagemap:map
    • Music album :: more:pagemap:musicalbum
    • Type unknown! :: more:pagemap:creativework
  8. Reload the Google CSE page to ensure the changes have been made, and try your broad searches again.

At the top of your search results, you should now see a set of tab names that match the names your newly added filters, along with the tab name All for results with no filters applied. Try switching filters to see if you can identify maps, music albums, or books amongst the general search results.

Lessons learned

In this exercise, you learned how to create a simple Google Custom Search Engine that can search across arbitrary web sites and create filters based on the structured data they contain.

About the author

Dan Scott is a systems librarian at Laurentian University.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.