Cataloguing for the open web with RDFa and schema.org

SWIB 2014

Dan Scott / +Dan Scott

Systems Librarian, Laurentian University

First principles: modern cataloguing is web-oriented

  • Cataloguers provide access points to resources to enable discovery
  • Books / library resources are for use
  • Most discovery starts at a major search engine on the web
  • Thus we must optimize discovery for the web

Foundations of the web

  • HTML, CSS, and JavaScript
  • Links
  • Search
  • Open standards and evolution

Foundations of open web discovery

Evergreen and Google, ca. 2009

  • Googlers wanted to help in their 20% time
  • Evergreen's Ajax catalogue could not be crawled
  • Opportunity lost, painful lesson learned
  • Solid motivation

The Semantic Web

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
Tim Berners-Lee, Scientific American, 2001

"an extension of the current web"

  • Not something separate from the current web.

"in which information is given well-defined meaning"

  • Like, adding RDFa or microdata to a web page?

"enabling computers and people to work in cooperation"

  • "work" - oriented towards getting things done

Thus the uptake of schema.org

  • Far from a perfect or complete vocabulary at launch in 2011, but:
    • It was understandable to normals (examples!)
    • Had practical uses
    • Offered compelling results

schema.org seized the moment

  • Libraries had a decade to do things "the right way"
  • Opportunity lost, painful lesson learned (hopefully)
  • Solid motivation!

Therefore, be it resolved that library systems

  • Offer persistent URIs
  • Publish sitemaps for the pages of interest
  • Gracefully adopt schema.org as their first embedded vocabulary
  • Use that vocabulary to build library-specific applications

schema.org myths

Myth: schema.org is a coarse vocabulary

  • 595 classes
  • 826 properties
  • ... and growing

Myth: schema.org can't handle library complexity

Myth: schema.org process is closed

Linked data, not just structured data!

Progressively enrich RDFa + schema.org to break out of silos

Use the @resource, Luke!

De-facto standards: GoodRelations

aka "How to tell the world what copies you have of things"

Best Buy

<http://www.bestbuy.com/site/a/7731019.p> md:item ( [ a schema:Product ;
schema:name "LG - G Watch for Select Android Devices - Titan Black" ;
schema:offers [ a schema:Offer ;
        schema:availability "OutOfStock" ;
        schema:itemCondition "NewCondition" ;
        schema:price "$79.99" ;
        schema:priceCurrency "USD" ;
        schema:seller [ a schema:Organization ;
                schema:name "BestBuy" ] ] ;

AbeBooks.com

<http://www.abebooks.com/servlet/BookDetailsPL?bi=11682446984> md:item ( [ a schema:Book ;
schema:name "The Semantic Web: A Guide to the Future of ..."@en ;
schema:offers [ a schema:Offer ;
        schema:availability "InStock"@en ;
        schema:itemCondition "UsedCondition"@en ;
        schema:price "1.00"@en ;
        schema:priceCurrency "USD"@en ] ;
schema:publisher "Wiley"@en ] ) ;

Evergreen library system

<http://acorn.biblio.org/eg/opac/record/1826746#schemarecord> a schema:Book,
        schema:Product ;
schema:name "Pull : the power of the Semantic Web to transform your business / David Siegel."@en-us ;
schema:offers [ a schema:Offer ;
        schema:availability schema:InStock ;
        schema:availableAtOrFrom "Adult Nonfiction"@en-us ;
        schema:businessFunction <http://purl.org/goodrelations/v1#LeaseOut> ;
        schema:gtin13 "9781591842774"@en-us ;
        schema:price "0.00"@en-us ;
        schema:seller <http://acorn.biblio.org/eg/opac/library/NMLFRD> ;
        schema:serialNumber "34021116328218"@en-us ;
        schema:sku "658.8072 SIE "@en-us ] .

Koha library system

<http://catalog.bywatersolutions.com/cgi-bin/koha/opac-detail.pl?biblionumber=47958#record> a schema:Book,
        schema:Product ;
    schema:name "The cathedral and the bazaar : musings on Linux and Open Source by an accidental revolutionary / "@en .

[] a schema:Offer ;
    schema:availability schema:OutOfStock ;
    schema:businessFunction <http://purl.org/goodrelations/v1#LeaseOut> ;
    schema:itemOffered <http://catalog.bywatersolutions.com/cgi-bin/koha/opac-detail.pl?biblionumber=47958#record> ;
    schema:seller "East Branch"@en ;
    schema:serialNumber "M12087334"@en ;
    schema:sku "005.4/32 (Browse shelf)"@en .

Vufind discovery system


<http://vufind.org/demo/Record/946330#record> a schema:Book,
        schema:Product ;
    schema:name "LinKnot : knot theory by computer / "@en .

[] a schema:Offer ;
    schema:availability schema:InStock ;
    schema:businessFunction <http://purl.org/goodrelations/v1#LeaseOut> ;
    schema:itemOffered <http://vufind.org/demo/Record/946330#record> ;
    schema:seller "Campus B"@en ;
    schema:serialNumber "00020739"@en ;
    schema:sku "S482.8110"@en .
                        

Library systems

Publishing structured data, out of the box, by default!

Evergreen

As of release 2.6.0 which came out this spring, Evergreen offers RDFa for:

  • Bibliographic records
  • Holdings
  • Libraries (opening hours, contacts, locations, branch relationships)

Koha

As of release 3.14.0 which came out this spring, Koha offers RDFa for:

  • Bibliographic records
  • Holdings

Discovery layers / repositories

VuFind

As of release 2.2 which came out in January 2014, VuFind offers RDFa for:

  • Bibliographic records
  • Holdings

Blacklight

As of release 2.2 which came out in January, Blacklight offers microdata for:

  • Bibliographic records

Islandora

As of release 7.x-1.3 which came out in March 2014, Islandora offers RDFa for:

  • Bibliographic records

`

ScholarSphere

  • Bibliographic records

Semantic web workflows for old tasks

Key concept: Your web of documents becomes your web of data.

Creating a union catalogue

  • Old approach: branches periodically push records to a central repository
  • Semantic web approach: central repository crawls branches intelligently using their sitemaps

Checking item availability

  • Old approach: Z39.50 query with custom mappings for each source
  • Old approach (?): Z39.83 (NCIP) query with custom mappings for each source
  • Semantic web approach: request record URI, check availability of offerings

Learn RDFa + schema.org today! For free!

Use the self-guided tutorial to work through a series of hands-on exercises with real library data

This presentation is available at coffeecode.net/swib14/talk.

The presentation source is available at github.com/dbs/reveal.js/tree/swib14_talk.

Creative Commons Licence
Cataloguing for the open web with RDFa and schema.org
by Dan Scott is licensed under a Creative Commons Attribution 4.0 International License.