Ade Oshineye was singing the praises of Google+ Local pages
Some guy commented on Ade's G+ post about the presentation:
Still looking forward to a simple solution for
automated Page updates for non-spammy metadata like hours of operation (yes, I
was the guy who asked that question). Maybe we could teach G+ to scrape some
selected microdata from an authoritative linked Web site, if pushing seems too
dangerous?
The trouble with the web
Happy 25th, World Wide Web!
We should work toward a universal linked information system, in which generality
and portability are more important than fancy graphics techniques and complex extra facilities. [...] There is also much support from the publishing industry, and from librarians whose job it is to organise information.
<html>
<head>
<title>Fancy academic article</title>
</head>
<body>
<h1>The web as intellectual rising tide</h1>
<p>As my esteemed colleague
<a href="http://example.com/author">Foo Bar</a> has
<a href="http://example.com/article">astutely observed</a>
...
</p>
</body>
</html>
Bags of words are still hard
Bags of words are still hard
Plenty of ambiguity problems remain
When a web page mentions "Dan Scott", it could be:
the character from the One Tree Hill TV show
the artist from Magic the Gathering card game
the Ontario academic professor from the University of Waterloo
the Ontario academic librarian from Laurentian University
What about variations like "Daniel B. Scott", "Scott, Dan", "Scott, Dan, 1972-"?
Trouble with the Semantic Web
Tim Berners-Lee again
The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning,
better enabling computers and people to work in cooperation.
Scientific American, "The Semantic Web". May 17, 2001.
Tim Berners-Lee and Linked Data
for HTML or RDF, the same expectations apply to make the web grow:
Use URIs as names for things
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
Include links to other URIs. so that they can discover more things.
In July 2011, schema.org was announced by major search engines (Google, Yahoo, Yandex)
Some goals:
Offer a simple vocabulary for the short tail of web results (events, products, people)
Enable normal web page publishers to be able to add schema.org markup via RDFa Lite or microdata without having to be Semantic Web experts
Enable search engines to aggregate data and apply finer-grained disambiguation and relevance strategies
A reaction to the state of the Semantic Web
schema.org: hierarchical types
schema.org: Creative Works
schema.org: Book
schema.org: focus on examples
Libraries: early technology adopters
Telnet access to catalogues
Z39.50 protocol for sharing records
OpenURL protocol for resolving article requests
COinS microformat for embedding citations in HTML
unAPI for offering different metadata representations
Consistent flaw: almost entirely library-specific technologies
MARC: (still) kind of a big deal
If your holdings are not in OCLC, you're not linked to Google Books.
Our decentralized infrastructure has been centralized.
Linked data library system leadership
Some library systems have supplemented or discarded MARC for a linked data model:
Swedish Union Catalog
German National Library
Bibliothèque nationale de France
But is your library...
Large (national scale)
Able to sustain multi-year development efforts with concurrent systems
Able to mandate significant changes to established practices
...?
Better living through Web standards
Better living through Web standards
Hypothesis: We can iterate towards linked data with existing systems
Enhance our catalogues with web standards:
Persistent URIs
HTML5
RDFa (or microdata) expressing schema.org
Sitemaps listing all the URIs of interest
Reality check: 2012 Common Crawl
Ronallo(*) found that American academic libraries published under 10,000 schema.org instances in total
Possible reasons for this low adoption rate include:
Perceived return on investment is low in risk-averse, under-resourced institutions
Proprietary systems do not facilitate shared modifications of HTML templates
Sufficient access to underlying metadata may not be available in proprietary systems
Crawling: Evergreen early days
Simplistic implementation of schema.org via microdata with the 2.2 and 2.3 releases:
title/author/keyword properties
Direct plain text values (no embedded types)
2.4 release broke entities out as Person and Organization types, separated birthDate and deathDate values
2.5 release switched to RDFa, exposed holdings as Offer types
2.6 release includes links from holdings to Library types
Open source implementations
My efforts have:
Enhanced Evergreen, Koha, and VuFind to publish schema.org data
Enhanced Evergreen to expose Library metadata (location, contact information, hours of operation) linked from offers
Independently, the Blacklight library system was enhanced to publish schema.org data
Approximately 4,000 library systems will publish schema.org data as sites adopt the latest releases
We're all in this together: SchemaBibEx
The mission of this group is to discuss and prepare proposal(s) for extending Schema.org schemas for the improved representation of bibliographic information markup and sharing.
Founded by Richard Wallis (OCLC) in September 2012
W3C Schema Bib Extend Community Group - best practices for schema.org bibliographic use cases, extension proposals in progress, mailing list for assistance