Structured data for libraries: RDFa and schema.org

Practical Linked Data with Open Source

ALA 2014

Dan Scott / +Dan Scott

Systems Librarian, Laurentian University

We publish plenty of Web pages

<!DOCTYPE html>
<html>
  
Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.
Edition: First edition.
</html>

HTML as presentation

The above code renders in normal browsers roughly as:

Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.
Edition: First edition.

HTML as a linked web of documents

For example:



  Nix, Garth

In this case, pointing off to some other web page, presumably about Garth Nix.

Structured data in HTML: RDFa

  • The semantic web's Resource Description Framework (RDF) started with a focus on XML triples, resulting in complexity like:
    • Content negotiation (to get different representations of the same data from the same URL)
    • Building an entirely separate semantic web
  • Geniuses then figured out how to decorate HTML using attributes to represent types, properties, and relationships
  • The semantic web and the document web can therefore be united!
  • Consumers can derive direct relationships between the properties and the text that is displayed to actual humans with much more certainty

RDFa Lite: HTML attributes

Delineating structured data in HTML using attributes.

  • vocab: the vocabulary that defines the referenced types and their properties
  • typeof: the type of the thing being described
  • property: the property of the described thing
  • resource: an identifier for a given thing on the page
  • prefix: you can mix multiple vocabularies, if you have to

Scope

The typeof property creates a new scope, which is (normally) limited to the HTML element on which it is defined, and children of that element.

<!DOCTYPE html>
The scope starts here

and continues through all of the nested child elements

Blood ties: Plain HTML

<!DOCTYPE html>
Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.
Edition: First edition.

HTML + RDFa (basic)

<!DOCTYPE html>
Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.
Edition: First edition.

A Book and a few of its attributes as plain text.

HTML + RDFa (nested types)

<!DOCTYPE html>
Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.

A Person can have attributes other than just familyName and givenName

HTML + RDFa (@resource)

<!DOCTYPE html>
Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.
Edition: First edition.

Avoiding redundant redundancies!

HTML + RDFa (@prefix)

<!DOCTYPE html>
<div vocab="http://schema.org/" prefix="foaf: http://xmlns.com/foaf/0.1/">
  
Blood ties / Garth Nix and Sean Williams.
Author: Nix, Garth.
Publication Info: New York, NY : Scholastic Inc., 2014.
Edition: First edition.
</div>

Because the world is described by more than one ontology.

RDFa: extracting structured data

Image generated by http://rdfa.info/play

schema.org for structured data

  • There are many different vocabularies
    • Mixing vocabularies is the norm…
    • But can also be painful for publishers and consumers
  • schema.org launched in 2011 as a joint effort between Google, Yahoo, and Yandex
    • A single vocabulary for all the things! Well, all the things search engines really cared about
    • Ergo, it immediately became the only vocabulary SEO community cared about
  • In 2013, schema.org development moved to W3.org for open discussion

Now, publish structured data!

  • Your web applications:
    • Likely use a database of some kind, with well-defined types and properties
    • With a little bit of extra effort, you can map those types and properties to schema.org types and properties
  • And Boom!
    • You've just enriched the knowledge graph
    • And enabled other applications to easily build on your data
    • And likely improved your search ranking*


* I am not an SEO consultant and this does not constitute SEO advice

Resources

RDFa

Schema.org vocabulary