In this codelab, you're going to take a catalog page that describes a book and enhance it so that it contains structured data. You will use the schema.org vocabulary and express it via RDFa attributes.
Audience: Beginner
Prerequisites: To
complete this codelab, you will need a basic familiarity with HTML. The
exercises can be found in codelab.zip,
with the solutions found in the rdfa_exercises
subdirectory. There are
frequent checkpoints through the code lab, so if you get stuck at any point,
you can use the checkpoint file to resume and work through this codelab
at your own pace.
In this exercise, you will learn the basic steps required to add simple RDFa structured data to an existing library catalog page for a book.
Open step1/rdfa_book.html
in a text
editor. You should see something like the following HTML source for the
web page:
<!DOCTYPE html>
<html>
<head>
<title>Las Vegas-Clark County Library District /All Locations</title>
<style>...</style>
</head>
<body>
<div id="coverImage">
<div class="jacket"><img src=
"http://store.scholastic.com/content/stores/media/products/58/9780545522458_default_pdp.gif"
border="0"></div>
<div class="bibMedia"><img src="/screens/media_book.gif" alt="BOOKS"></div>
</div>
<table id="bib_detail" width="100%" border="0" cellspacing="1" cellpadding="2" class="bibDetail">
<tr class="bibInfoEntry">
<td>
<table width="100%" cellspacing="3" cellpadding="0">
<tr>
<td class="bibInfoLabel">Author</td>
<td class="bibInfoData"><a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a></td>
</tr>
<tr>
<td class="bibInfoLabel">Title</td>
<td class="bibInfoData"><strong>Blood ties / Garth Nix and Sean Williams.</strong></td>
</tr>
<tr>
<td class="bibInfoLabel">Publication Info.</td>
<td class="bibInfoData">New York, NY : Scholastic Inc., 2014.</td>
</tr>
<tr>
<td class="bibInfoLabel">Edition</td>
<td class="bibInfoData">First edition.</td>
</tr>
</table>
</td>
</tr>
</table>
Note: In a pinch, you can use the browser development tools to
view and edit the source of the web page (CTRL-Shift-i
in
Chrome or Firefox, in the Elements or Inspector tab
respectively).
There are a number of RDFa parsers, both online and locally installable, that can help you check the results of your work. Copy and paste the HTML source into each of the following online structured data extraction tools:
The results should (not suprisingly!) show that the page currently contains no structured data.
RDFa (Resource Description Framework in attributes) enables us to embed descriptions of things (types) and their properties within HTML documents using just a handful of HTML attributes.
To avoid a tower of Babel situation where one person uses the type name "author" to refer to the same concept that someone else calls a "writer", collections of types and their properties are typically standardized and published as a vocabulary (also known as an ontology).
Each type and property is expected to have a dereferenceable URI so that
you (or more realistically the machines) can look up the definition of
the vocabulary element and determine its relationship (if any) to other
vocabulary elements. For example, you can look up
http://schema.org/Book
and learn that it is a subclass of the Thing / CreativeWork
hierarchy.
You could use the full URI for each vocabulary element, but that would
be extremely verbose - especially given vocabularies that publish URIs
like http://rdaregistry.info/Elements/a/countryAssociatedWithThePerson.
Therefore, RDFa offers the @vocab
attribute; if you
add a vocab="http://<path/for/vocab>"
attribute to an HTML element, any of the RDFa @typeof
and
@property
attributes within its scope will automatically
prepend the specified value to those attributes.
We're going to use the schema.org vocabulary for our exercise, as it
includes types and properties that enable us to describe many things of
general interest without having to mix and match multiple vocabularies.
Declare the default vocabulary for the HTML document
as http://schema.org/
on the <body>
element.
Note: Do not forget the trailing slash (/
)!
<!DOCTYPE html>
<html>
<head>
<title>Las Vegas-Clark County Library District /All Locations</title>
<style>...</style>
</head>
<body vocab="http://schema.org/">
...
Checkpoint: Your HTML page should now look like step1/check.html
Many vocabularies focus on a particular domain; for example:
In practice, documents often ended up using types and properties from several different vocabularies. While vocabulary description languages like RDF Schema (RDFS) and the Web Ontology Language (OWL) offer ways to express equivalence between types and properties of different vocabularies, it can still be extremely complex to publish and consume mixed-vocabulary documents.
schema.org, on the other hand, tries to provide a vocabulary that can describe almost everything, albeit in many cases with less granularity than more specialized vocabularies.
Unless declared otherwise, web pages are assumed to have a type of WebPage. The choice of type is important as it dictates which properties you can "legally" use, so this section will help you find a more specific match for your purposes.
The schema.org types are arranged in a top-down hierarchy. Starting at
the top level of the type
hierarchy, browse through the CreativeWork
type hierarchy. Notice how each type inherits the properties from its parent
(beginning with Thing
), offers its own more specific definition
for its raison d'etre, and may add its own properties to enable you
to describe it more completely.
To declare an RDFa type for an HTML document, add the
@typeof
attribute to the <body>
element
and set the value of the attribute to Book
.
<!DOCTYPE html>
<html>
<head>
<title>Las Vegas-Clark County Library District /All Locations</title>
<style>...</style>
</head>
<body vocab="http://schema.org/" typeof="Book">
...
Checkpoint: Your HTML page should now look like step2/check_a.html
Every schema.org type has a name
property available to it, because the
property is declared on the Thing
type from which every other type
inherits. In the case of a Book
, the title of the book
is mapped to its name. Go ahead and add a @property="name"
attribute to the <strong>
element to assert that
the content of that element is the name of the technical article.
Note: You might be tempted to add the attribute to the
<title>
element of the HTML document, but this would
fall outside of the scope of your @typeof
attribute. And
while a search engine would likely make a best guess that, if the
content of the <title>
and <h1>
for a given web page match then that's likely the title, your explicit
assertion of that property is stronger than an inference.
Note: We are working with real catalog pages pulled from the
web. The <title>
element in this page is not
ideal, as it does not actually identify the content of the specific page.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Title</td>
<td class="bibInfoData"><strong property="name">Blood ties / Garth Nix and Sean Williams.</strong></td>
</tr>
...
This article has an author, and if you check the documentation for Book you will find that
there is indeed an author
property. Notice that the expected
type of the author
property is either a Person
or Organization
type. For now, go ahead and add the
@property="author"
attribute to the <a>
element for the author's name.
Note: You might be tempted to add the attribute to the
<tr>
element of the HTML document,
but the scope of the <tr>
element includes more
than just the name of the author, so you would be asserting (falsely!)
that the author was "Author Nix, Garth".
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Author</td>
<td class="bibInfoData"><a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse" property="author">Nix, Garth.</a></td>
</tr>
...
Check the results from various structured data parsers. Do they match
your expectations? Look closely at the author
value; you
probably did not expect the value of the author
property
to be a URL. This is one of the subtleties of RDFa; a
elements are special, in that the href
attribute value is
used for an RDFa property value rather than the content of the
<a>
element.
Let's fix that: move the @property="author"
attribute to the td
element that surrounds the a
element. Run your structured data
parsers again to ensure that you're getting the results that you expect.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Author</td>
<td class="bibInfoData" property="author"><a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a></td>
</tr>
...
Right now a date of publication is visible on the page, but as the
data just lives inside an undifferentiated string of text, it would
difficult for a machine to know what the data means. To remove
be this uncertainty, wrap the date in a <time>
tag and add the @property="datePublished"
attribute.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Publication Info.</td>
<td class="bibInfoData">New York, NY : Scholastic Inc., <time property="datePublished">2014</time>.</td>
</tr>
...
Checkpoint: Your HTML page should now look like step2/check_b.html
Every type in schema.org can have an image
property. One
potential use case for search engines is to use the image
property to guide the search engine to choose the appropriate image
from a page that might contain multiple images to provide a more
visually attractive search result. Your catalog page contains an image.
Add the @property="image"
attribute to the
<img>
element.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
<div id="coverImage">
<div class="jacket"><img src="http://goo.gl/cYK6J0" property="image" border="0"></div>
<div class="bibMedia"><img src="/screens/media_book.gif" alt="BOOKS"></div>
</div>
...
When you look at the documentation for the schema.org Book type, one of the properties that is
specific to the Book
type is the bookEdition
property--and our sample book says that it is a first edition, which just might
be of interest to researchers. Add the @property="bookEdition"
attribute to the corresponding td
element.
Repeat for the isbn
and numberOfPages
properties.
Note:
schema.org processors in particular understand that this level of granularity
is not always possible in practice, and will do the best they can with the data
they receive. So if the best you can do is mark the value of an ISBN in your
web page as 9780545522458 (hbk.) : $12.99
instead of just the
actual ISBN itself, processors may still be able to work out the actual value.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Edition</td>
<td class="bibInfoData" property="bookEdition">First edition.</td>
</tr>
...
<tr>
<td class="bibInfoLabel">Description</td>
<td class="bibInfoData"><span property="numberOfPages">186</span> pages : illustration ; 22 cm.</td>
</tr>
...
<tr>
<td class="bibInfoLabel">ISBN</td>
<td class="bibInfoData" property="isbn"><a href=
"/search~S12?/i9780545522458+%28hbk.%29+%3A/i9780545522458hbk/-3,-1,0,E/2browse">
9780545522458 (hbk.) : $12.99</a></td>
</tr>
<tr>
<td></td>
<td class="bibInfoData" property="isbn"><a href=
"/search~S12?/i0545522455+%28hbk.%29+%3A/i0545522455hbk/-3,-1,0,E/2browse">
0545522455 (hbk.) : $12.99</a></td>
</tr>
...
You might have noticed that some of the RDFa parsers generate a rich
snippet that shows you what your page might look
like as a search result. You may also have noticed that the rich snippet
did not contain much content of your page other than its title. To help
search engines generate a better rich snippet, you should include a
@property="description"
attribute in your web page.
Find the Summary section of the page, which provides a
nice description of the book, and add the @property="description">
attribute to the appropriate td
element.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Summary</td>
<td class="bibInfoData" property="description">As the Conquerors try to
destroy Erdas, Meilin -- fed up with waiting and
ready to fight -- sets off into enemy territory with
her spirit animal, a panda named Jhi. Her friends
Conor and Abeke aren't far behind ... but they're not
the only ones.</td>
</tr>
...
It can be helpful for books to include an indication of their
intended audience. For example, this fictional book is intended
(according to the publisher)
for grades 3-7. Fortunately, schema.org offers
the typicalAgeRange
property for this purpose on the
CreativeWork
type and its children. However, your page
does not include an obvious place to attach this markup.
When you realize that a vocabulary has pointed out a possible
deficiency in your work, you could revisit the web page and add an
"Age Range" field that you could then use to classify all of your
work. In this step, assume that you are working with a strict designer
who forbids you from altering the look or content of the page. In that
situation, your only option is to use a <meta>
element to define the property value for the machines.
Go ahead and add <meta property="typicalAgeRange" content="8-12">
anywhere within the scope of the Book
. The solutions
add the element directly under the <body>
element.
Note: Do not use this approach as a license to stuff your
web page full of lascivious keywords that have no connection to your
content in the hopes of drawing a larger audience to your site. The
search engines learned about this "spiderfood" tactic back in the 90's
and will punish your site mercilessly with low relevancy ranking if you
are determined to have been trying to game their systems. The generally
accepted best practice is to try to only add machine-readable markup to
the same content that humans can see. Reserve <meta>
elements only for the most important purposes.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
<meta property="typicalAgeRange" content="8-12">
...
Checkpoint: Your HTML page should now look like step2/check_c.html
In this exercise, you learned:
@content
attribute to supply
machine-readable versions of human-oriented data<meta>
element to supply properties
that would not otherwise be part of the content
So far you have described the page using a single type and a handful
properties. However, when you added the @property="author"
attribute, the expected value for the property (the range) was
not a simple text string; it was supposed to be an entity of either the
Person or the Organization type.
In this exercise, you will add several embedded entities to the page to conform to the vocabulary definition and make your structured data even more useful.
Continue working with the HTML file that you have been editing so far, or for a fresh start, copy step2/check_c.html into a new file.
Your @property="author"
attribute needs to define a
Person
entity to satisfy the expected value of
author
. Simply add the @typeof="Person"
attribute to the same HTML element so that you are, in one step,
defining the author
attribute for the overall
Book
entity, while simultaneously starting a new
Person
entity scope.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Author</td>
<td class="bibInfoData" property="author" typeof="Person">
<a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a>
</td>
</tr>
...
Now that you have defined a Person
entity, you can define
specific properties for it.
Declare that the person's name is the name
property of the
Person
entity.
Tip: Remember that you might need to add
<span>
tags to create a new scope for the properties
that you want to add.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Author</td>
<td class="bibInfoData" property="author" typeof="Person">
<span property="name">
<a href="/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">Nix, Garth.</a>
</span>
</td>
</tr>
...
Copyright is an important subject for both creators and organizations
and individuals seeking to reuse or republish work, so naturally
schema.org includes a copyrightHolder
property that you
can apply. In this case, however, the author and the copyrightHolder
are one and the same, and you have already used the
@property
attribute.
To define multiple property values for the same attribute, simply
include the values as a whitespace-delimited list. In this case, edit
the HTML to declare @property="author copyrightHolder"
and check your work in one or more structured data validators.
Note: These are still relatively early days for structured
data validators, and their output varies for more esoteric cases
like multi-valued attributes. For example, the Structured Data Linter
recognizes the second value for copyrightHolder
but
generates a "blank node" identifier for it, whereas Google's
Structured Data Testing Tool only recognizes the last value of the
multi-valued attribute. To complicate matters further, the search
engines recognize that their tools have bugs that differ from what
their actual production parser understands... so don't be overly
alarmed if it seems like your markup is not being recognized by the
testing tool.
Sometimes your HTML document does not group all of the content in such
a way that you can cleanly keep all of the attributes for a given
instance of an entity within a single scope. In these cases, you may be
able to use the @resource
attribute to logically group the
properties for that instance.
For example, when you added the @typeof="Person"
declaration for the author, the name of the author was separated from
your existing Person
instance by the <a>
element in the middle. The new scope that that <a>
introduces makes it a bit more difficult to mark up the
familyName
and givenName
of the author.
To resolve the problem, add a @resource
attribute
to your existing Person
declaration. The value of the new
attribute should be unique on this page; use #author1
for the sake of simplicity.
Then add a wrapping <span>
element around the
name of the author inside the a
element, including a
@resource
attribute with a value of #author1
to match
what you added above. This creates a new scope for the existing entity,
such that any properties declared within this new scope will be added to that
entity.
Now add another <span>
element inside the newly scoped
#author1
resource, and declare it to be the
name
property. For bonus points, you can
nest the givenName
and familyName
properties inside of the name
property.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<tr>
<td class="bibInfoLabel">Author</td>
<td class="bibInfoData" property="author copyrightHolder" typeof="Person" resource="#author1">
<a href="/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">
<span resource="#author1">
<span property="name">
<span property="familyName">Nix</span>,
<span property="givenName">Garth</span>
</span>
</span>.</a>
</td>
</tr>
...
Note: Now that you know about @resource
, you can
improve the granularity of your ISBN markup. Add a resource value
for the Book
entity and make your isbn
properties refer to the Book
entity.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<tr>
<td class="bibInfoLabel">ISBN</td>
<td class="bibInfoData"><a href=
"/search~S12?/i9780545522458+%28hbk.%29+%3A/i9780545522458hbk/-3,-1,0,E/2browse">
<span resource="#book"><span property="isbn">9780545522458</span></span> (hbk.) : $12.99</a></td>
</tr>
<tr>
<td></td>
<td class="bibInfoData"><a href=
"/search~S12?/i0545522455+%28hbk.%29+%3A/i0545522455hbk/-3,-1,0,E/2browse">
<span resource="#book"><span property="isbn">0545522455</span></span> (hbk.) : $12.99</a></td>
</tr>
...
So far we have not provided any value for the publisher
property, which tends to be important for creative works. The publisher documentation shows
that the expected range is Organization
, which in turn
has child types such as Corporation
.
Corporation
entity with the name of the
publisher as the name
property.location
property for the
Corporation
entity. Notice that the expected range is a type of
either Place
or PostalAddress
. Use a
PostalAddress
entity, filling in the addressLocality
and addressRegion
properties.<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<tr>
<td class="bibInfoLabel">Publication Info.</td>
<td class="bibInfoData">
<span property="publisher" typeof="Corporation" resource="#publisher">
<span property="location" typeof="PostalAddress">
<span property="addressLocality">New York</span>,
<span property="addressRegion">NY</span> :
</span>
<span property="name">Scholastic Inc.</span>,
</span>
<time property="datePublished">2014</time>.
</td>
</tr>
...
There is a second author for this book, Sean Williams, that should be reflected in the machine-readable markup.
Person
entity with the name
,
givenName
, and familyName
properties.birthDate
property.<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book">
...
<td class="bibInfoData"><span property="author" typeof="Person" resource="#author2">
<a href="/search~S12?/aWilliams%2C+Sean%2C+1967-/awilliams+sean+1967/-3,-1,0,B/browse">
<span resource="#author2">
<span property="name">
<span property="familyName">Williams</span>,
<span property="givenName">Sean</span>
</span>,
<span property="birthDate">1967</span>-
</span>
</a>
</td>
...
Checkpoint: Your HTML page should now look like step3/check_d.html
In this exercise, you learned:
@property
attribute@resource
to group assertions for a single entity on a pageSo far you have described the page using types and properties that are inside the page itself. But if you have to update some information that is common to many of your pages, that could be painful to roll out... and even if you have an automated process for updating that information across all of your pages, there is no guarantee that anything extracting data from your site will extract all of the updates at one time.
Fortunately, the problem of providing one copy of information on the web was solved at the same time the web was created: via the simple power of the link! And structured data is no different; in fact, linked data is a term that has emerged over the past few years marking a more pragmatic approach to building a web of structured data than the somewhat classically academic semantic web.
The following principles of linked data were first articulated by Tim Berners-Lee in a 2006 design note:
Keep these principles in mind as you work through the following steps!
Continue working with the HTML file that you have been editing so far, or for a fresh start, copy step3/check_d.html into a new file.
There are many sources of identifiers for people on the web. Some sources that you may find familiar include:
Assuming your underlying system has the ability to store and express
identifiers, you can help the machines disambiguate and retrieve more
information about your authors by linking to their identifiers from your
catalog page. Use the sameas
property to add links from your
simple text representation of the authors of this book to external
resources.
Hint: To save you time in looking up identifiers, here are a set for Garth Nix:
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<td class="bibInfoData" property="author copyrightHolder" typeof="Person" resource="#author1">
<a href= "/search~S12?/aNix%2C+Garth./anix+garth/-3,-1,0,B/browse">
<span resource="#author1">
<link property="sameAs" href="http://id.loc.gov/authorities/names/n96003050">
<link property="sameAs" href="http://viaf.org/viaf/39520118">
<link property="sameAs" href="http://dbpedia.org/resource/Garth_Nix">
<link property="sameAs" href="http://www.freebase.com/m/01qqfy">
<span property="name">
<span property="familyName">Nix</span>,
<span property="givenName">Garth</span>
</span>
</span>.</a>
</td>
...
Note: While it might be tempting to use the url
property, that is normally reserved for linking to a URL where the
thing that is described is available (for example, linking to a
downloadable podcast or e-book). In contrast, sameAs
is
used to link to a description of the thing.
Take a look at how the page has developed over time; there is now a lot of HTML markup just to describe the author, and you can imagine more markup if you were to express all of the see from and see also forms that might be contained in a local authority record. If your system uses local authority records, in fact, they are a perfect candidate for refactoring your markup. You can move the bulk of the markup from the bibliographic record display page into a separate page about the author, built on your local authority record. Then, once it is a separately displayed page, then you can simply link to it from this page... as well as from any other pages that want to provide information about this author.
Create a new file named garthNix.html
in your text editor,
and copy the @resource="#author1"
markup into the file.
As the new file describes a single type, you can move the
declaration of the type into the <body>
element of
the new page, and you can (optionally) remove the @resource
attributes from the markup that you pasted into the file. Don't forget the
@vocab
declaration! Use your existing page as a template.
Use the RDFa parsers to ensure that the markup in the new file
expresses the same information as it did in the original file.
Repeat these steps to create seanWilliams.html
, using the
@resource="#author2"
markup as the source of interest.
<!DOCTYPE html>
<html>
<head>
<title>Garth Nix</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
<link property="sameAs" href="http://id.loc.gov/authorities/names/n96003050">
<link property="sameAs" href="http://viaf.org/viaf/39520118">
<link property="sameAs" href="http://dbpedia.org/resource/Garth_Nix">
<link property="sameAs" href="http://www.freebase.com/m/01qqfy">
<span property="name"><span property="familyName">Nix</span>, <span property="givenName">Garth</span></span>.
</body>
</html>
<!DOCTYPE html>
<html>
<head>
<title>Sean Williams, 1967-</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
<link property="sameAs" href="http://id.loc.gov/authorities/names/nr97009613">
<link property="sameAs" href="http://viaf.org/viaf/102404013">
<link property="sameAs" href="http://dbpedia.org/page/Sean_Williams_(author)">
<link property="sameAs" href="http://www.freebase.com/m/06z9bf">
<span property="name"><span property="familyName">Williams</span>,
<span property="givenName">Sean</span>
</span>,<span property="birthDate">1967</span>-
</body>
</html>
Now, replace the inline markup in the original page with a simple link
to your new file. You still want to state that "Author Name" is the
author of the technical article using the @property="author"
assertion, but now you can either add that property directly to an
<a>
element that links to your new file, or use the
resource
attribute to link to the external file instead of
the internal markup. This is a
signal to any RDFa parser that the linked resource contains the data
for the named property.
Note: "when the element contains the href
(or
src
) attribute, @property
is automatically
associated with the value of this attribute rather than the textual
content of the <a>
element" (Adida, Ben;
Birbeck, Mark; Herman, Ivan; Sporny, Manu. RDFa 1.1 Primer - Second
edition). Using a @property
attribute on the
same element as a @resource
attribute works in a similar
fashion; the target of the @resource
attribute is used as
the value of the @property
attribute.
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<tr>
<td class="bibInfoLabel">
Added Author</td>
<td class="bibInfoData" property="author" resource="seanWilliams.html#person">
<a href="/search~S12?/aWilliams%2C+Sean%2C+1967-/awilliams+sean+1967/-3,-1,0,B/browse">
Williams, Sean, 1967-
</a>
</td>
</tr>
Checkpoint: Your original HTML page should now look like step4/check_e.html and your new author HTML pages should look like step4/garthNix.html and step4/seanWilliams.html.
Now that you have created an entirely separate author page, you can add much more information about the author; for example, you can include an email address, links to their personal web sites and social media accounts, a list of their publications and previous talks... far more information than you would have wanted to publish inline in the article itself.
Following the principles of linked data can lead not only to more efficient maintenance of information and (potentially) more useful results in search engines and other aggregators of data, but also to a better information design and experience for your users.
Use the Person
properties to flesh out the "about this author" page with properties
such as address
, birthDate
, email
, follows
, and telephone
. Be
adventurous, and remember to try to use nested types and ranges
appropriately!
In this exercise, you learned:
@property
and @href
attributes to link to data on another pageIn this exercise, you will mark up subject headings. The first approach treats subject headings simply as keywords, which is appropriate for library systems that do not control subject headings or which do not expose the source for the subject headings. Then we will embellish our markup by treating the subject headings as part of an externally controlled vocabulary.
Identifying subject headings in the catalog page as simple text keywords can be useful for building a search engine that can provide relevance bumps based on the keywords, rather than relying on arbitrary text within the web page.
Find the subject headings in the page, mark them up using the schema.org
keywords
property,
and check your work.
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
..
<tr>
<td class="bibInfoLabel">Subject</td>
<td class="bibInfoData" property="keywords">
<a href="/search~S12?/foo">Human-animal relationships -- Juvenile fiction.</a>
</td>
</tr>
While simple text keywords can be useful, we have learned that by linking to external entities, machines can disambiguate text and connect our work to the broader cloud of linked data.
Find matches for the subject headings in the page in http://id.loc.gov and mark them up as external
entities. This time, use the about
property, as it is
intended to identify The subject matter of the content--perfect for
our purposes. Then check your work.
Note: You can also continue to mark up the text of the subject headings as keywords, if you like; these approaches are compatible and different clients may use different approaches to consuming the data that you offer.
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
..
<tr>
<td class="bibInfoLabel">Subject</td>
<td class="bibInfoData" property="keywords">
<link property="about" href="http://id.loc.gov/authorities/subjects/sh2008122001">
<a href="/search~S12?/foo">Human-animal relationships -- Juvenile fiction.</a>
</td>
</tr>
Checkpoint: Your original HTML page should now look like step4/check_f.html.
In this exercise, you learned:
keywords
property to potentially
improve the relevance of those keywords in search results
in consuming applications;about
property.We have already seen that entities such as authors and subject headings often have other representations on the web to which we can connect our own
Freebase is a source of linked open data that uses its own schema to represent entities, including books to which specific editions are attached in a quasi-FRBR fashion.
m/01069fkb
) and appending it to
https://www.googleapis.com/freebase/v1/rdf/
.
sameAs
property
to link your book to the Freebase edition.
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
<link property="sameAs" href="https://www.googleapis.com/freebase/v1/rdf/m/01069fkb">
...
OCLC recently announced that it had published 197 million open bibliographic work descriptions.
exampleOfWork
has been proposed as a
schema.org extension, but has not yet been accepted.
sameAs
property
to link your book to the OCLC Work Description.
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
<link property="sameAs" href="https://www.googleapis.com/freebase/v1/rdf/m/01069fkb">
<link property="sameAs" href="http://worldcat.org/entity/work/id/1782516719">
...
You can also link to the publisher's description. Even though the publisher might not offer any linked data themselves, it might serve as a useful identifier for the machines.
Once again, you can use the sameAs
property to link your book to the publisher's description.
<!DOCTYPE html>
<body vocab="http://schema.org/" typeof="Book" resource="#book">
<link property="sameAs" href="https://www.googleapis.com/freebase/v1/rdf/m/01069fkb">
<link property="sameAs" href="http://worldcat.org/entity/work/id/1782516719">
<link property="sameAs" href="http://www.scholastic.ca/books/view/spirit-animals-book-three-blood-ties">
...
Checkpoint: Your original HTML page should now look like step4/check_g.html.
In this exercise, you learned how to link your bibliographic description to the bibliographic descriptions available from Freebase, OCLC, and publishers, as a means of giving machines more entry points into linked open data.
Dan Scott is a systems librarian at Laurentian University.
This work
is licensed under a Creative
Commons Attribution-ShareAlike 4.0 International License.