RDFa with schema.org codelab: Book

By Dan Scott,

About this codelab

In this codelab, you're going to take a catalog page that describes a book and enhance it so that it contains structured data. You will use the schema.org vocabulary and express it via RDFa attributes.

Audience: Beginner

Prerequisites: To complete this codelab, you will need a basic familiarity with HTML. The exercises can be found in codelab.zip, with the solutions found in the rdfa_exercises subdirectory. There are frequent checkpoints through the code lab, so if you get stuck at any point, you can use the checkpoint file to resume and work through this codelab at your own pace.

From basic HTML to RDFa: first steps

In this exercise, you will learn the basic steps required to add simple RDFa structured data to an existing library catalog page for a book.

View the page source HTML

Open step1/rdfa_book.html in a text editor. You should see something like the following HTML source for the web page:

<!DOCTYPE html>
<html>
<head>
  <title>Las Vegas-Clark County Library District /All Locations</title>
  <style>...</style>
</head>

<body>
  <div id="coverImage">
    <div class="jacket"><img src=
    "http://store.scholastic.com/content/stores/media/products/58/9780545522458_default_pdp.gif"
    border="0"></div>

    <div class="bibMedia"><img src="/screens/media_book.gif" alt="BOOKS"></div>
  </div>

  <table id="bib_detail" width="100%" border="0" cellspacing="1" cellpadding="2" class="bibDetail">
    <tr class="bibInfoEntry">
      <td>
        <table width="100%" cellspacing="3" cellpadding="0">
          <tr>
            <td class="bibInfoLabel">Author</td>
            <td class="bibInfoData"><a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a></td>
          </tr>

          <tr>
            <td class="bibInfoLabel">Title</td>
            <td class="bibInfoData"><strong>Blood ties / Garth Nix and Sean Williams.</strong></td>
          </tr>

          <tr>
            <td class="bibInfoLabel">Publication Info.</td>
            <td class="bibInfoData">New York, NY : Scholastic Inc., 2014.</td>
          </tr>

          <tr>
            <td class="bibInfoLabel">Edition</td>
            <td class="bibInfoData">First edition.</td>
          </tr>
        </table>
      </td>
    </tr>
  </table>

Note: In a pinch, you can use the browser development tools to view and edit the source of the web page (CTRL-Shift-i in Chrome or Firefox, in the Elements or Inspector tab respectively).

Extract and view structured data in HTML

There are a number of RDFa parsers, both online and locally installable, that can help you check the results of your work. Copy and paste the HTML source into each of the following online structured data extraction tools:

The results should (not suprisingly!) show that the page currently contains no structured data.

Add the RDFa vocabulary declaration

Preamble

RDFa (Resource Description Framework in attributes) enables us to embed descriptions of things (types) and their properties within HTML documents using just a handful of HTML attributes.

To avoid a tower of Babel situation where one person uses the type name "author" to refer to the same concept that someone else calls a "writer", collections of types and their properties are typically standardized and published as a vocabulary (also known as an ontology).

Each type and property is expected to have a dereferenceable URI so that you (or more realistically the machines) can look up the definition of the vocabulary element and determine its relationship (if any) to other vocabulary elements. For example, you can look up http://schema.org/Book and learn that it is a subclass of the Thing / CreativeWork hierarchy.

You could use the full URI for each vocabulary element, but that would be extremely verbose - especially given vocabularies that publish URIs like http://rdaregistry.info/Elements/a/countryAssociatedWithThePerson. Therefore, RDFa offers the @vocab attribute; if you add a vocab="http://<path/for/vocab>" attribute to an HTML element, any of the RDFa @typeof and @property attributes within its scope will automatically prepend the specified value to those attributes.

Declare the schema.org vocabulary as your default

We're going to use the schema.org vocabulary for our exercise, as it includes types and properties that enable us to describe many things of general interest without having to mix and match multiple vocabularies. Declare the default vocabulary for the HTML document as http://schema.org/ on the <body> element. Note: Do not forget the trailing slash (/)!

Check your markup
<!DOCTYPE html>
<html>
<head>
  <title>Las Vegas-Clark County Library District /All Locations</title>
  <style>...</style>
</head>

<body vocab="http://schema.org/">
...

Checkpoint: Your HTML page should now look like step1/check.html

Specialized versus general vocabularies

Many vocabularies focus on a particular domain; for example:

  • Friend-of-a-Friend (FOAF): social connections
  • Portable Contacts (PoCo): contact information
  • Bibliographic Ontology (Bibo): bibliographic references
  • Good Relations (gr): sales offers, orders, and agents

In practice, documents often ended up using types and properties from several different vocabularies. While vocabulary description languages like RDF Schema (RDFS) and the Web Ontology Language (OWL) offer ways to express equivalence between types and properties of different vocabularies, it can still be extremely complex to publish and consume mixed-vocabulary documents.

schema.org, on the other hand, tries to provide a vocabulary that can describe almost everything, albeit in many cases with less granularity than more specialized vocabularies.

Add the type and associated properties for your page

Preamble

Unless declared otherwise, web pages are assumed to have a type of WebPage. The choice of type is important as it dictates which properties you can "legally" use, so this section will help you find a more specific match for your purposes.

Browse the schema.org type hierarchy

The schema.org types are arranged in a top-down hierarchy. Starting at the top level of the type hierarchy, browse through the CreativeWork type hierarchy. Notice how each type inherits the properties from its parent (beginning with Thing), offers its own more specific definition for its raison d'etre, and may add its own properties to enable you to describe it more completely.

Add the type declaration for the document

To declare an RDFa type for an HTML document, add the @typeof attribute to the <body> element and set the value of the attribute to Book.

Check your markup
<!DOCTYPE html>
<html>
<head>
  <title>Las Vegas-Clark County Library District /All Locations</title>
  <style>...</style>
</head>

<body vocab="http://schema.org/" typeof="Book">
...

Checkpoint: Your HTML page should now look like step2/check_a.html

Add a name property for the type

Every schema.org type has a name property available to it, because the property is declared on the Thing type from which every other type inherits. In the case of a Book, the title of the book is mapped to its name. Go ahead and add a @property="name" attribute to the <strong> element to assert that the content of that element is the name of the technical article.

Note: You might be tempted to add the attribute to the <title> element of the HTML document, but this would fall outside of the scope of your @typeof attribute. And while a search engine would likely make a best guess that, if the content of the <title> and <h1> for a given web page match then that's likely the title, your explicit assertion of that property is stronger than an inference.

Note: We are working with real catalog pages pulled from the web. The <title> element in this page is not ideal, as it does not actually identify the content of the specific page.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Title</td>
    <td class="bibInfoData"><strong property="name">Blood ties / Garth Nix and Sean Williams.</strong></td>
  </tr>
...

Add an author property for the type

This article has an author, and if you check the documentation for Book you will find that there is indeed an author property. Notice that the expected type of the author property is either a Person or Organization type. For now, go ahead and add the @property="author" attribute to the <a> element for the author's name.

Note: You might be tempted to add the attribute to the <tr> element of the HTML document, but the scope of the <tr> element includes more than just the name of the author, so you would be asserting (falsely!) that the author was "Author Nix, Garth".

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Author</td>
    <td class="bibInfoData"><a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse" property="author">Nix, Garth.</a></td>
  </tr>
...

Improve the author property

Check the results from various structured data parsers. Do they match your expectations? Look closely at the author value; you probably did not expect the value of the author property to be a URL. This is one of the subtleties of RDFa; a elements are special, in that the href attribute value is used for an RDFa property value rather than the content of the <a> element.

Let's fix that: move the @property="author" attribute to the td element that surrounds the a element. Run your structured data parsers again to ensure that you're getting the results that you expect.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Author</td>
    <td class="bibInfoData" property="author"><a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a></td>
  </tr>
...

Add a datePublished property for the type

Right now a date of publication is visible on the page, but as the data just lives inside an undifferentiated string of text, it would difficult for a machine to know what the data means. To remove be this uncertainty, wrap the date in a <time> tag and add the @property="datePublished" attribute.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Publication Info.</td>
    <td class="bibInfoData">New York, NY : Scholastic Inc., <time property="datePublished">2014</time>.</td>
  </tr>
...

Checkpoint: Your HTML page should now look like step2/check_b.html

Add an image property for the Book type

Every type in schema.org can have an image property. One potential use case for search engines is to use the image property to guide the search engine to choose the appropriate image from a page that might contain multiple images to provide a more visually attractive search result. Your catalog page contains an image. Add the @property="image" attribute to the <img> element.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
  <div id="coverImage">
    <div class="jacket"><img src="http://goo.gl/cYK6J0" property="image" border="0"></div>

    <div class="bibMedia"><img src="/screens/media_book.gif" alt="BOOKS"></div>
  </div>
...

Add book-specific properties to the Book entity

When you look at the documentation for the schema.org Book type, one of the properties that is specific to the Book type is the bookEdition property--and our sample book says that it is a first edition, which just might be of interest to researchers. Add the @property="bookEdition" attribute to the corresponding td element.

Repeat for the isbn and numberOfPages properties.

Note: schema.org processors in particular understand that this level of granularity is not always possible in practice, and will do the best they can with the data they receive. So if the best you can do is mark the value of an ISBN in your web page as 9780545522458 (hbk.) : $12.99 instead of just the actual ISBN itself, processors may still be able to work out the actual value.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
    <tr>
      <td class="bibInfoLabel">Edition</td>
      <td class="bibInfoData" property="bookEdition">First edition.</td>
    </tr>
...
    <tr>
      <td class="bibInfoLabel">Description</td>
      <td class="bibInfoData"><span property="numberOfPages">186</span> pages : illustration ; 22 cm.</td>
    </tr>
...
    <tr>
      <td class="bibInfoLabel">ISBN</td>
      <td class="bibInfoData" property="isbn"><a href=
      "/search~S12?/i9780545522458+%28hbk.%29+%3A/i9780545522458hbk/-3,-1,0,E/2browse">
      9780545522458 (hbk.) : $12.99</a></td>
    </tr>

    <tr>
      <td></td>
      <td class="bibInfoData" property="isbn"><a href=
      "/search~S12?/i0545522455+%28hbk.%29+%3A/i0545522455hbk/-3,-1,0,E/2browse">
      0545522455 (hbk.) : $12.99</a></td>
    </tr>
...

Add a description property

You might have noticed that some of the RDFa parsers generate a rich snippet that shows you what your page might look like as a search result. You may also have noticed that the rich snippet did not contain much content of your page other than its title. To help search engines generate a better rich snippet, you should include a @property="description" attribute in your web page.

Find the Summary section of the page, which provides a nice description of the book, and add the @property="description"> attribute to the appropriate td element.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
    <tr>
      <td class="bibInfoLabel">Summary</td>
      <td class="bibInfoData" property="description">As the Conquerors try to
      destroy Erdas, Meilin -- fed up with waiting and
      ready to fight -- sets off into enemy territory with
      her spirit animal, a panda named Jhi. Her friends
      Conor and Abeke aren't far behind ... but they're not
      the only ones.</td>
    </tr>
...

Add an implicit property

It can be helpful for books to include an indication of their intended audience. For example, this fictional book is intended (according to the publisher) for grades 3-7. Fortunately, schema.org offers the typicalAgeRange property for this purpose on the CreativeWork type and its children. However, your page does not include an obvious place to attach this markup.

When you realize that a vocabulary has pointed out a possible deficiency in your work, you could revisit the web page and add an "Age Range" field that you could then use to classify all of your work. In this step, assume that you are working with a strict designer who forbids you from altering the look or content of the page. In that situation, your only option is to use a <meta> element to define the property value for the machines.

Go ahead and add <meta property="typicalAgeRange" content="8-12"> anywhere within the scope of the Book. The solutions add the element directly under the <body> element.

Note: Do not use this approach as a license to stuff your web page full of lascivious keywords that have no connection to your content in the hopes of drawing a larger audience to your site. The search engines learned about this "spiderfood" tactic back in the 90's and will punish your site mercilessly with low relevancy ranking if you are determined to have been trying to game their systems. The generally accepted best practice is to try to only add machine-readable markup to the same content that humans can see. Reserve <meta> elements only for the most important purposes.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
  <meta property="typicalAgeRange" content="8-12">
...

Checkpoint: Your HTML page should now look like step2/check_c.html

Lessons learned

In this exercise, you learned:

Embedded types

So far you have described the page using a single type and a handful properties. However, when you added the @property="author" attribute, the expected value for the property (the range) was not a simple text string; it was supposed to be an entity of either the Person or the Organization type.

In this exercise, you will add several embedded entities to the page to conform to the vocabulary definition and make your structured data even more useful.

Continue working with the HTML file that you have been editing so far, or for a fresh start, copy step2/check_c.html into a new file.

Define the Person entity

Your @property="author" attribute needs to define a Person entity to satisfy the expected value of author. Simply add the @typeof="Person" attribute to the same HTML element so that you are, in one step, defining the author attribute for the overall Book entity, while simultaneously starting a new Person entity scope.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Author</td>
    <td class="bibInfoData" property="author" typeof="Person">
      <a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a>
    </td>
  </tr>
...

Define basic properties of the Person entity

Now that you have defined a Person entity, you can define specific properties for it.

Declare that the person's name is the name property of the Person entity.

Tip: Remember that you might need to add <span> tags to create a new scope for the properties that you want to add.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Author</td>
    <td class="bibInfoData" property="author" typeof="Person">
      <span property="name">
        <a href="/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a>
      </span>
    </td>
  </tr>
...

Declare that the same Person is both the author and copyrightHolder

Copyright is an important subject for both creators and organizations and individuals seeking to reuse or republish work, so naturally schema.org includes a copyrightHolder property that you can apply. In this case, however, the author and the copyrightHolder are one and the same, and you have already used the @property attribute.

To define multiple property values for the same attribute, simply include the values as a whitespace-delimited list. In this case, edit the HTML to declare @property="author copyrightHolder" and check your work in one or more structured data validators.

Note: These are still relatively early days for structured data validators, and their output varies for more esoteric cases like multi-valued attributes. For example, the Structured Data Linter recognizes the second value for copyrightHolder but generates a "blank node" identifier for it, whereas Google's Structured Data Testing Tool only recognizes the last value of the multi-valued attribute. To complicate matters further, the search engines recognize that their tools have bugs that differ from what their actual production parser understands... so don't be overly alarmed if it seems like your markup is not being recognized by the testing tool.

Use the @resource attribute to group assertions for an entity

Sometimes your HTML document does not group all of the content in such a way that you can cleanly keep all of the attributes for a given instance of an entity within a single scope. In these cases, you may be able to use the @resource attribute to logically group the properties for that instance.

For example, when you added the @typeof="Person" declaration for the author, the name of the author was separated from your existing Person instance by the <a> element in the middle. The new scope that that <a> introduces makes it a bit more difficult to mark up the familyName and givenName of the author.

To resolve the problem, add a @resource attribute to your existing Person declaration. The value of the new attribute should be unique on this page; use #author1 for the sake of simplicity.

Then add a wrapping <span> element around the name of the author inside the a element, including a @resource attribute with a value of #author1 to match what you added above. This creates a new scope for the existing entity, such that any properties declared within this new scope will be added to that entity.

Now add another <span> element inside the newly scoped #author1 resource, and declare it to be the name property. For bonus points, you can nest the givenName and familyName properties inside of the name property.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <tr>
    <td class="bibInfoLabel">Author</td>
    <td class="bibInfoData" property="author copyrightHolder" typeof="Person" resource="#author1">
      <a href="/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">
        <span resource="#author1">
          <span property="name">
            <span property="familyName">Nix</span>,
            <span property="givenName">Garth</span>
          </span>
       </span>.</a>
    </td>
  </tr>
...

Improve your ISBN markup

Note: Now that you know about @resource, you can improve the granularity of your ISBN markup. Add a resource value for the Book entity and make your isbn properties refer to the Book entity.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
  <tr>
    <td class="bibInfoLabel">ISBN</td>
    <td class="bibInfoData"><a href=
    "/search~S12?/i9780545522458+%28hbk.%29+%3A/i9780545522458hbk/-3,-1,0,E/2browse">
    <span resource="#book"><span property="isbn">9780545522458</span></span> (hbk.) : $12.99</a></td>
  </tr>

  <tr>
    <td></td>
    <td class="bibInfoData"><a href=
    "/search~S12?/i0545522455+%28hbk.%29+%3A/i0545522455hbk/-3,-1,0,E/2browse">
    <span resource="#book"><span property="isbn">0545522455</span></span> (hbk.) : $12.99</a></td>
  </tr>

...

Describe the publisher as an Organization type

So far we have not provided any value for the publisher property, which tends to be important for creative works. The publisher documentation shows that the expected range is Organization, which in turn has child types such as Corporation.

  1. Define a new Corporation entity with the name of the publisher as the name property.
  2. Add a location property for the Corporation entity. Notice that the expected range is a type of either Place or PostalAddress. Use a PostalAddress entity, filling in the addressLocality and addressRegion properties.
Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
  <tr>
    <td class="bibInfoLabel">Publication Info.</td>
    <td class="bibInfoData">
      <span property="publisher" typeof="Corporation" resource="#publisher">
        <span property="location" typeof="PostalAddress">
          <span property="addressLocality">New York</span>,
          <span property="addressRegion">NY</span> : 
        </span>
        <span property="name">Scholastic Inc.</span>, 
      </span>
      <time property="datePublished">2014</time>.
    </td>
  </tr>
...

Bonus exercise: Add a second author of type Person

There is a second author for this book, Sean Williams, that should be reflected in the machine-readable markup.

  1. Define the new Person entity with the name, givenName, and familyName properties.
  2. Define the birth date of this author using the birthDate property.
Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
  <td class="bibInfoData"><span property="author" typeof="Person" resource="#author2">
    <a href="/search~S12?/aWilliams%2C+Sean%2C+1967-/awilliams+sean+1967/-3,-1,0,B/browse">
      <span resource="#author2">
        <span property="name">
          <span property="familyName">Williams</span>,
          <span property="givenName">Sean</span>
        </span>,
        <span property="birthDate">1967</span>-
      </span>
    </a>
  </td>
...

Checkpoint: Your HTML page should now look like step3/check_d.html

Lessons learned

In this exercise, you learned:

Strings to things

So far you have described the page using types and properties that are inside the page itself. But if you have to update some information that is common to many of your pages, that could be painful to roll out... and even if you have an automated process for updating that information across all of your pages, there is no guarantee that anything extracting data from your site will extract all of the updates at one time.

Fortunately, the problem of providing one copy of information on the web was solved at the same time the web was created: via the simple power of the link! And structured data is no different; in fact, linked data is a term that has emerged over the past few years marking a more pragmatic approach to building a web of structured data than the somewhat classically academic semantic web.

The following principles of linked data were first articulated by Tim Berners-Lee in a 2006 design note:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

Keep these principles in mind as you work through the following steps!

Continue working with the HTML file that you have been editing so far, or for a fresh start, copy step3/check_d.html into a new file.

Link the authors to external pages

There are many sources of identifiers for people on the web. Some sources that you may find familiar include:

Assuming your underlying system has the ability to store and express identifiers, you can help the machines disambiguate and retrieve more information about your authors by linking to their identifiers from your catalog page. Use the sameas property to add links from your simple text representation of the authors of this book to external resources.

Hint: To save you time in looking up identifiers, here are a set for Garth Nix:

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
  <td class="bibInfoData" property="author copyrightHolder" typeof="Person" resource="#author1">
    <a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">
      <span resource="#author1">
        <link property="sameAs" href="http://id.loc.gov/authorities/names/n96003050">
        <link property="sameAs" href="http://viaf.org/viaf/39520118">
        <link property="sameAs" href="http://dbpedia.org/resource/Garth_Nix">
        <link property="sameAs" href="http://www.freebase.com/m/01qqfy">

        <span property="name">
          <span property="familyName">Nix</span>,
          <span property="givenName">Garth</span>
        </span>
     </span>.</a>
   </td>
...

Note: While it might be tempting to use the url property, that is normally reserved for linking to a URL where the thing that is described is available (for example, linking to a downloadable podcast or e-book). In contrast, sameAs is used to link to a description of the thing.

Create separate pages for the authors in your own system

Take a look at how the page has developed over time; there is now a lot of HTML markup just to describe the author, and you can imagine more markup if you were to express all of the see from and see also forms that might be contained in a local authority record. If your system uses local authority records, in fact, they are a perfect candidate for refactoring your markup. You can move the bulk of the markup from the bibliographic record display page into a separate page about the author, built on your local authority record. Then, once it is a separately displayed page, then you can simply link to it from this page... as well as from any other pages that want to provide information about this author.

Create a new file named garthNix.html in your text editor, and copy the @resource="#author1" markup into the file.

As the new file describes a single type, you can move the declaration of the type into the <body> element of the new page, and you can (optionally) remove the @resource attributes from the markup that you pasted into the file. Don't forget the @vocab declaration! Use your existing page as a template. Use the RDFa parsers to ensure that the markup in the new file expresses the same information as it did in the original file.

Repeat these steps to create seanWilliams.html, using the @resource="#author2" markup as the source of interest.

Check your garthNix.html
<!DOCTYPE html>
<html>
<head>
  <title>Garth Nix</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
  <link property="sameAs" href="http://id.loc.gov/authorities/names/n96003050">
  <link property="sameAs" href="http://viaf.org/viaf/39520118">
  <link property="sameAs" href="http://dbpedia.org/resource/Garth_Nix">
  <link property="sameAs" href="http://www.freebase.com/m/01qqfy">
  <span property="name"><span property="familyName">Nix</span>, <span property="givenName">Garth</span></span>.
</body>
</html>
Check your seanWilliams.html
<!DOCTYPE html>
<html>
<head>
  <title>Sean Williams, 1967-</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
  <link property="sameAs" href="http://id.loc.gov/authorities/names/nr97009613">
  <link property="sameAs" href="http://viaf.org/viaf/102404013">
  <link property="sameAs" href="http://dbpedia.org/page/Sean_Williams_(author)">
  <link property="sameAs" href="http://www.freebase.com/m/06z9bf">
  <span property="name"><span property="familyName">Williams</span>,
    <span property="givenName">Sean</span>
  </span>,<span property="birthDate">1967</span>-
</body>
</html>

Link to the author page

Now, replace the inline markup in the original page with a simple link to your new file. You still want to state that "Author Name" is the author of the technical article using the @property="author" assertion, but now you can either add that property directly to an <a> element that links to your new file, or use the resource attribute to link to the external file instead of the internal markup. This is a signal to any RDFa parser that the linked resource contains the data for the named property.

Note: "when the element contains the href (or src) attribute, @property is automatically associated with the value of this attribute rather than the textual content of the <a> element" (Adida, Ben; Birbeck, Mark; Herman, Ivan; Sporny, Manu. RDFa 1.1 Primer - Second edition). Using a @property attribute on the same element as a @resource attribute works in a similar fashion; the target of the @resource attribute is used as the value of the @property attribute.

Check your markup
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
  <tr>
    <td class="bibInfoLabel">
    Added Author</td>

    <td class="bibInfoData" property="author" resource="seanWilliams.html#person">
      <a href="/search~S12?/aWilliams%2C+Sean%2C+1967-/awilliams+sean+1967/-3,-1,0,B/browse">
        Williams, Sean, 1967-
      </a>
    </td>
  </tr>

Checkpoint: Your original HTML page should now look like step4/check_e.html and your new author HTML pages should look like step4/garthNix.html and step4/seanWilliams.html.

Augment the author page

Now that you have created an entirely separate author page, you can add much more information about the author; for example, you can include an email address, links to their personal web sites and social media accounts, a list of their publications and previous talks... far more information than you would have wanted to publish inline in the article itself.

Following the principles of linked data can lead not only to more efficient maintenance of information and (potentially) more useful results in search engines and other aggregators of data, but also to a better information design and experience for your users.

Use the Person properties to flesh out the "about this author" page with properties such as address, birthDate, email, follows, and telephone. Be adventurous, and remember to try to use nested types and ranges appropriately!

Lessons learned

In this exercise, you learned:

Subject headings

In this exercise, you will mark up subject headings. The first approach treats subject headings simply as keywords, which is appropriate for library systems that do not control subject headings or which do not expose the source for the subject headings. Then we will embellish our markup by treating the subject headings as part of an externally controlled vocabulary.

Marking up subject headings as keywords

Identifying subject headings in the catalog page as simple text keywords can be useful for building a search engine that can provide relevance bumps based on the keywords, rather than relying on arbitrary text within the web page.

Find the subject headings in the page, mark them up using the schema.org keywords property, and check your work.

Check your markup
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
..
  <tr>
    <td class="bibInfoLabel">Subject</td>

    <td class="bibInfoData" property="keywords">
      <a href="/search~S12?/foo">Human-animal relationships -- Juvenile fiction.</a>
    </td>
  </tr>

Marking up subject headings as things

While simple text keywords can be useful, we have learned that by linking to external entities, machines can disambiguate text and connect our work to the broader cloud of linked data.

Find matches for the subject headings in the page in http://id.loc.gov and mark them up as external entities. This time, use the about property, as it is intended to identify The subject matter of the content--perfect for our purposes. Then check your work.

Note: You can also continue to mark up the text of the subject headings as keywords, if you like; these approaches are compatible and different clients may use different approaches to consuming the data that you offer.

Check your markup
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
..
  <tr>
    <td class="bibInfoLabel">Subject</td>

    <td class="bibInfoData" property="keywords">
      <link property="about" href="http://id.loc.gov/authorities/subjects/sh2008122001">
      <a href="/search~S12?/foo">Human-animal relationships -- Juvenile fiction.</a>
    </td>
  </tr>

Checkpoint: Your original HTML page should now look like step4/check_f.html.

Lessons learned

In this exercise, you learned:

Linking to other bibliographic descriptions

We have already seen that entities such as authors and subject headings often have other representations on the web to which we can connect our own

Linking to a Freebase editions

Freebase is a source of linked open data that uses its own schema to represent entities, including books to which specific editions are attached in a quasi-FRBR fashion.

  1. Search for the corresponding book edition at freebase.com; the ISBN13 is a good entry point.
  2. The resulting URL http://www.freebase.com/m/01069fkb is a human-readable representation. If you read the Freebase documentation, you will eventually find that you can link to a machine-readable RDF representation by taking the Freebase topic ID (m/01069fkb) and appending it to https://www.googleapis.com/freebase/v1/rdf/.
  3. Use a sameAs property to link your book to the Freebase edition.
Check your markup
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
  <link property="sameAs" href="https://www.googleapis.com/freebase/v1/rdf/m/01069fkb">
...

Linking to an OCLC Bibliographic Work Description

OCLC recently announced that it had published 197 million open bibliographic work descriptions.

  1. Search for the corresponding book at worldcat.org; the ISBN13 is a good entry point.
  2. Scroll down to the Linked Data section of the WorldCat details page and click on the + to display the contents.
  3. Find the schema:exampleOfWork property and find the link to the OCLC Work Entity. The link redirects to an experimental URL, but you can copy and paste the link into your book record. You can also find the non-experimental URL at the top of the Work Description page. Aside: exampleOfWork has been proposed as a schema.org extension, but has not yet been accepted.
  4. Use a sameAs property to link your book to the OCLC Work Description.
Check your markup
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
  <link property="sameAs" href="https://www.googleapis.com/freebase/v1/rdf/m/01069fkb">
  <link property="sameAs" href="http://worldcat.org/entity/work/id/1782516719">
...

Linking to the publisher's description

You can also link to the publisher's description. Even though the publisher might not offer any linked data themselves, it might serve as a useful identifier for the machines.

Once again, you can use the sameAs property to link your book to the publisher's description.

Check your markup
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
  <link property="sameAs" href="https://www.googleapis.com/freebase/v1/rdf/m/01069fkb">
  <link property="sameAs" href="http://worldcat.org/entity/work/id/1782516719">
  <link property="sameAs" href="http://www.scholastic.ca/books/view/spirit-animals-book-three-blood-ties">
...

Checkpoint: Your original HTML page should now look like step4/check_g.html.

Lessons learned

In this exercise, you learned how to link your bibliographic description to the bibliographic descriptions available from Freebase, OCLC, and publishers, as a means of giving machines more entry points into linked open data.

About the author

Dan Scott is a systems librarian at Laurentian University.

Informational resources

  • RDFa Lite (W3C Recommendation) - a marvel of technical writing, this is a specification written as a concise, extremely useful tutorial
  • schema.org - the source for the vocabulary types and definitions, although the examples all use microdata or JSON-LD instead of RDFa Lite
  • RDFa Primer (W3C Working Group Note) - a more in-depth RDFa tutorial that covers properties beyond RDFa Lite; the additional examples may help clarify how RDFa Lite works (really, you don't need anything beyond RDFa Lite!)
  • Heath, Tom; Bizer, Christian. Linked data: Evolving the Web into a Global Space - a book (freely available on the web) that goes into depth to cover the principles, patterns, and best practices for publishing linked data on the web

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.