Cataloguing for the open web

ALA 2014: Understanding schema.org

ALA 2014

Dan Scott / +Dan Scott

Systems Librarian, Laurentian University

The saddest robots.txt a library catalog can offer

User-Agent: *
Disallow: /

AKA "Machines are not welcome here."

Libraries provide multiple access points for resources in catalogues

Users need to know that catalogues exist

Google, Bing, Yahoo are the ready-at-hand search tools

To expand access to our resources ...

We need to expose our access points to third-party search engines

and schema.org is the vocabulary initiated by the search engines

We must not let the perfect be the enemy of the good

This is what I have been doing!

Recipe for schema.org catalogue success

  1. Allow machines to crawl your pages: robots.txt and persistent URIs
  2. Embed machine-readable schema.org metadata into ordinary catalogue pages
  3. Generate sitemaps for your record detail pages
  4. Tell the search engines about the sitemaps

Unstructured -> structured -> linked data

  • Bibliographic records
  • Holdings, using the Product/Offer model
  • Libraries
    • Hours of operation
    • Web site
    • Location
    • Contact information
    • Branch relationships

Teach your systems to speak RDFa + schema.org

  • Practical linked data for open source preconference
  • My portion was a hands-on, self-guided tutorial teaching RDFa + schema.org in library systems:
    • Mark up bibliographic record detail pages
    • Express library holdings as schema.org Offer entities
    • Mark up library pages with opening hours, contact info, etc
    • Build a simple union catalogue in five minutes using Google Custom Search Engine
    • Build a proof of concept sitemap crawler / RDFa extractor
  • All materials available for free under a CC-BY-SA license!
  • Go to https://coffeecode.net
  • You might need to wade through a blog post or two

Addressing gaps in schema.org

In 2013 schema.org moved to W3C WebSchemas as the public forum for enhancement proposals

Enter the W3C schema.org Bibliographic Extension Community Group

  • AKA SchemaBibEx
  • A rag-tag group of linked data bibliographic enthusiasts trying to establish consensus
  • Using open source library systems (Evergreen, Koha, VuFind) as proofs of concept / reference implementations

Some schemabibex wins

  • Early: moving "citation" from MedicalScholarlyArticle to CreativeWork
  • Middle: Holdings-as-Offers as a best practice, with decommercialized descriptions
  • So close we can taste it: Periodicals and CreativeWork relationships

The future is now

  • Linked data is happening,
    • for real,
    • en masse,
    • thanks in no small part to schema.org
  • But Jason is bringing more of the future to you...
  • JSON-LD and Actions and more, oh my!