White hat SEO:

Structured Web Data

OLA SuperConference 2015

Dan Scott / +Dan Scott
Systems Librarian, Laurentian University

Search engines vs. libraries

  • Goal: Connect users with your resources using their search tool.
    • People are working on this problem!
    • Ask me about Evergreen, Koha, or VuFind!
    • Also Libhub via BIBFRAME
  • But first: Connect users with your library using their search tool.

When you search for "library"...

Getting your library into search

  • Populate social media profiles with branch addresses, hours, contact info, pictures...
    • Google: create Google+ pages
    • Bing: populate Foursquare
    • Yahoo!: maybe take out a Yellow Pages ad?
  • But that adds more places to maintain copies of data

OCLC to the rescue?

  • The WorldCat Registry claims
    • to be the Authoritative single source for institutional metadata
    • that Built-in Web services distribute Registry data across the Web, enhancing Web discovery of your content and services
  • Alas, the claims differ wildly from reality.
WorldCat Registry showing almost no data for Greater Sudbury Public Library
WorldCat Registry showing some data for J.N. Desmarais Library
WorldCat Registry showing almost no data for Toronto Public Library

Are you convinced we have a visibility challenge?

SEO is not a dirty word

[SEO] is a valid practice which seeks to improve technical and content aspects of a website, making the content easier to find, relevant, and more accessible to the search engine crawlers.

Web site prayer

Grant me the serenity to accept the things I cannot change,

Courage to change the things I can,

And wisdom to know the difference.

Reinhold Niebuhr via Sinead O'Connor

Not just structured data

Take care of the basics first!

  • Cool, persistent URLs
  • Accessible HTML
  • robots.txt
  • Sitemaps

Cool, persistent URLs

  • Cool URIs don't change
  • Links to your web content increases its ranking
  • Hide the publishing mechanisms and focus on the content!
    • Bad: http://example.ca/CMS-product/index.php?JSESSIONID=abc123;page=hours
    • Good: http://example.ca/hours

Accessible HTML

  • Cynthia Ng talks about accessibility as Universal Design
    • Use meaningful HTML elements, @class attributes, and CSS.
    • Don't use an image where text would do.
    • ALT text, captions, and transcriptions help machines as well as humans.
  • But you're all AODA-compliant, right?
  • Check in wave.webaim.org - Web Accessibility Evaluation Tool

Blocking the robots: robots.txt

  • Standard format for telling robots what not to crawl.
  • Always found at root of website (http://example.ca/robots.txt).
  • Specifies web site directories to block; default is to allow everything.
  • Obeyed by well-behaved robots (includes major search engines).

Search results, facets, filters, etc


http://catalogue.example.com/search?term=fish 
http://catalogue.example.com/search?term=fishes
http://catalogue.example.com/search?term=fishing
                        

Search engines will repeatedly crawl variants of these.

Block them!


User-agent: *
Disallow: /search/
                        

Don't block everything!


User-agent: *
Disallow: /
                        

This is not a good thing to see on your site.

You're telling search engines you don't want to be found.

Guiding the machines: sitemaps

  • XML format that lists every URL of interest for robots


   
      http://example.com/about
      2015-01-29
   
   
      http://example.com/hours
      2015-01-25
   

                        

Advertise your sitemap

Just add a Sitemap: line to your robots.txt file:


Sitemap: https://example.ca/sitemapindex.xml
                        

Structured data: why?

Even clean HTML is still just a bag of words (and media) to machines.

Social media sharing text instead of article text
Article with title 'Change text size for the story'

Enter schema.org

  • In July 2011, schema.org was announced by the major search engines; goals were:
    • Offer a simple vocabulary for the most common entities on the web (events, products, people)
    • Enable normal webmasters to add schema.org markup without needing Semantic Web expertise
    • Help search engines aggregate data and apply finer-grained disambiguation and relevance strategies

Library data of interest

My local library branch page contains:

  • Library name
  • Address
  • Home page
  • Phone number
  • Fax number
  • Hours of operation
  • Description of available services
  • Picture
  • Related branches

What's in schema.org for libraries?

On scope in HTML documents

Scope refers to the contents of a given HTML element, including all of its child elements.


<body>

    

</body>

RDFa vs. microdata vs. JSON-LD

  • RDFa and microdata are attributes that you add to your existing HTML
  • JSON-LD is a more recent, complex option
  • schema.org includes examples for all three types of markup
  • We're going to be using RDFa (a W3C standard) for our examples

Say you're using schema.org

  • The @vocab attribute specifies the default vocabulary
  • Default applies to the entire scope
  • So specify it on the <body> element

<body vocab="http://schema.org/">
...
</body>
                        

Say you're describing a Library

  • The @typeof element identifies the type you're describing
  • In general, properties in scope will be applied to that type
  • Let's specify the type on the <body> element, too

<body vocab="http://schema.org/" typeof="Library">
...
</body>
                        

Identify your library's name

  • http://schema.org/name is as part of Thing
  • name is used for everything: people, organizations, book and movie titles...
  • To declare a property of a type in RDFa, use the @property attribute:

<body vocab="http://schema.org/" typeof="Library">
...
  

South End Library  

... </body>

Picture this

  • image identifies relevant images for the given entity
  • Guides apps in choosing an image for social media posts, search results, etc
  • In this case, the South End Library sign seems like the best choice:

<body vocab="http://schema.org/" typeof="Library">
...
  

South End Library

 

Exterieur view of the South End Library

... </body>

Testing your work

Standards vs. expectations

Google Structured Data Testing Tool shows missing logo and url properties

schema.org does not require logo and url properties on Library, but Google expects them

Declare the logo

  • There is a logo at the top of the page
  • Order of properties does not matter to schema.org, so let's declare it:

<body vocab="http://schema.org/" typeof="Library">
...
  Company Logo
  ...
  

South End Library

 

Exterior view of the South End Library

... </body>

Declare the url

  • But the page lives at a URL! you might be thinking
  • There can be more than one entity described on a single page, so url helps uniquely identify them.
  • The navigation links to this page, so we can reuse it:

<body vocab="http://schema.org/" typeof="Library">
...
  Company Logo
  ...
  South End Library
  ...
  

South End Library

 

Exterior view of the South End Library

... </body>

Wait; @property gets values how?

  • Usually @property takes the value of the child text nodes:
    • OneTwo
      keywords gets "OneTwo"
  • But if an @href or @src is on the same element, the property gets that value instead:
    • Library name
      
      
      url gets "/example" and image gets "/example.png"

Locating the library


<body vocab="http://schema.org/" typeof="Library">
    

1991 Regent Street
Sudbury, ON P3E 5V3
Phone: (705) 688-3950
Fax: (705) 522-7788

</body>

A more precise address

  • Right now the value of our address includes the phone and fax numbers
  • Create a more contained scope with a new span tag:

<body vocab="http://schema.org/" typeof="Library">
    

1991 Regent Street
Sudbury, ON P3E 5V3

Phone: (705) 688-3950
Fax: (705) 522-7788

</body>

Phone and fax numbers

  • Now that the phone and fax numbers are separate, we can identify those
  • New span tags required again:

<body vocab="http://schema.org/" typeof="Library">
    

1991 Regent Street
Sudbury, ON P3E 5V3

Phone: (705) 688-3950
Fax: (705) 522-7788

</body>

address revisited

  • address expects a specific type as a value: PostalAddress
  • Machines expect humans to get things wrong or be lazy
  • Will do their best with plain Text values where structured types are expected
  • But we're better than that! Add a @typeof=PostalAddress declaration:

<body vocab="http://schema.org/" typeof="Library">
    

1991 Regent Street
Sudbury, ON P3E 5V3

Phone: (705) 688-3950
Fax: (705) 522-7788

</body>

PostalAddress


<body vocab="http://schema.org/" typeof="Library">
    

1991 Regent Street
Sudbury, ON P3E 5V3

Phone: (705) 688-3950
Fax: (705) 522-7788

</body>

Hours of operation

OpeningHoursSpecification


<body vocab="http://schema.org/" typeof="Library">
<tr property="openingHoursSpecification" typeof="OpeningHoursSpecification">
  <td property="dayOfWeek" href="http://purl.org/goodrelations/v1#Sunday">
    Sunday**</td>
  <td>
    - </td>
</tr>
</body>
                        

Related branches

  • Library offers a branchOf property (via LocalOrganization)
  • By linking to our parent branch, robots can reason that other Libraries that link to the same parent are related branches
  • Simplest approach is to link to the overall home page for the organization:

<body vocab="http://schema.org/" typeof="Library">
...
  
    Company Logo
  
...
</body>
                        

Events

  • We all want more uptake of the events we offer
  • The Event type enables us to advertise these in a machine-readable way
  • Aggregators can list events at a given date / time across organizations and help get the word out

Sample event


Family - Story Time @ South End Library
Description: Develop your child's appreciation of language, rhythm and imagination through storytelling, puppetry, songs, finger plays and rhymes.
Date: Thursday - January 15 2015
Time: 10:30 AM - 11:30 AM
Location: South Branch
Program Type: Family
Series: Thursdays, 09-18-2014 to 12-11-2014, 10:30AM - 11:30AM
Public Note: Free! Children under 12 must be accompanied by a parent or guardian at all times while in the library.
30 Seats Remaining

Name and description


Family - Story Time @ South End Library
Description: Develop your child's appreciation of language, rhythm and imagination through storytelling, puppetry, songs, finger plays and rhymes.
Date: Thursday - January 15 2015
Time: 10:30 AM - 11:30 AM
Location: South Branch
Program Type: Family
Public Note: Free! Children under 12 must be accompanied by a parent or guardian at all times while in the library.
30 Seats Remaining

Location and organizer

  • The South Branch is both the organizer and the location of the event
  • We can link to the branch and include both properties (separated by a space) in the @property attribute

Family - Story Time @ South End Library
Description: Develop your child's appreciation of language, rhythm and imagination through storytelling, puppetry, songs, finger plays and rhymes.
Date: Thursday - January 15 2015
Time: 10:30 AM - 11:30 AM
Location: South Branch
Program Type: Family
Public Note: Free! Children under 12 must be accompanied by a parent or guardian at all times while in the library.
30 Seats Remaining

Start and end times

  • Events have a startDate and an endDate
  • These expect specifically formatted ISO-8601 date/time values; we'll use a time element with a datetime property to express them

Family - Story Time @ South End Library
Description: Develop your child's appreciation of language, rhythm and imagination through storytelling, puppetry, songs, finger plays and rhymes.
Date: Thursday - January 15 2015
Time: -
Location: South Branch
Program Type: Family
Public Note: Free! Children under 12 must be accompanied by a parent or guardian at all times while in the library.
30 Seats Remaining

Typical age range

  • The typicalAgeRange property specifies a target audience
  • It takes just a simple text range:

Family - Story Time @ South End Library
Description: Develop your child's appreciation of language, rhythm and imagination through storytelling, puppetry, songs, finger plays and rhymes.
Date: Thursday - January 15 2015
Time: -
Location: South Branch
Program Type: Family
Public Note: Free! Children under 12 must be accompanied by a parent or guardian at all times while in the library.
30 Seats Remaining