Wikidata workshop for librarians

Objectives and learning outcomes

Participants in this workshop will:

Log in

You could edit anonymously, but why? Building a set of contributions to your name gives you bragging rights and enables you to participate in Wikimedia Foundation elections.

Your username and password are the same as your Wikipedia, Wikimedia Commons, etc credentials.

Wikidata Sandbox

You can edit the special Wikidata Sandbox without having to worry about it actually affecting anything: https://www.wikidata.org/wiki/Q4115189 

Licensing

Everything in Wikidata is licensed under the Creative Commons Public Domain Dedication (CC0) license: effectively public domain. The theory is that:

  1. simple statements of fact are not subject to copyright and thus a license isn't necessary and
  2. The CC0 license enables the content to be reused as broadly as possible.

By contributing to Wikidata, you agree to license your contributions under the CC0 license.

By the same token, you can use any of the content in Wikidata without worrying about licensing obligations.

This is important! In a context where OCLC recommends the use of the Open Data Commons Attribution (ODC-BY) license for aggregations of MARC records, meeting the strict legal requirement to provide attribution statements for the originators of the data in the realities of a mixed-provenance world quickly becomes onerous.

Impact

As a source of openly licensed machine-readable data, a reasonably skilled developer can easily integrate data into applications.

For example, in this proof of concept in the Laurentian University library catalogue (https://laurentian.concat.ca/eg/opac/record/353368?query=Rock%20music;qtype=subject;fi%3Asearch_format=music), record detail pages contain JavaScript that checks to see if the item is described as a "Music Album". If so, the script adds a music note (♩) beside the first listed contributor (performer, artist, composer, etc).

Clicking the note queries Wikidata to find any bands or musicians with a matching name, and returns the best match with a subset of data that might be of interest, including its description, an image, and the Wikidata and Songkick IDs.

I created this proof of concept in about six hours. Getting the data from Wikidata was straightforward; most of my effort went into figuring out how to display the data in a reasonable way.

Entities and statements, properties, values

Everything in Wikidata is either an entity (a person, place, thing, or concept), or a statement : a property for an entity that has a specific value.

Helpful diagram: https://www.wikidata.org/wiki/Wikidata:Introduction#/media/File:Datamodel_in_Wikidata.svg 

Helpful Wikidata tour: https://www.wikidata.org/wiki/Wikidata:Tours 

Entities

For example, Ludwig van Beethoven is a Wikidata entity, which was assigned the entity ID "Q255". You can find the Wikidata entry at https://wikidata.org/entity/Q255 which (eventually) redirects to https://www.wikidata.org/wiki/Q255 because Wikidata recognizes that a human is requesting the information about that entity via a web browser and it offers up a human-friendly version of the data. See https://www.wikidata.org/wiki/Wikidata:Data_access#Linked_Data_Interface for the gory details.

You will note that the entity for Ludwig von Beethoven has very little narrative text: the label, description, and alternate names for the entity are shown in multiple languages, and then we can see the list of properties and associated values for the entity in a simple table format below.

Notability

You are probably familiar with the notability requirement for Wikipedia articles, which states that to avoid deletion, a topic must generally have "received significant coverage in reliable sources that are independent of the subject".

The bar for a Wikidata entity, on the other hand, is much lower. Per https://www.wikidata.org/wiki/Wikidata:Notability, the entity must meet one of the following criteria:

  1. It contains at least one valid sitelink to a page on Wikipedia, Wikivoyage, Wikisource, Wikiquote, Wikinews, Wikibooks, Wikidata, Wikispecies, Wikiversity, or Wikimedia Commons.
  2. It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references. If there is no item about you yet, you are probably not notable.
  3. It fulfills some structural need, for example: it is needed to make statements made in other items more useful.

The "fulfills some structural need" criteria enables the creation of Wikidata entities so that you can link to each member of a band, for example, or every performing artist (band or individual) in a music festival. Any statements about these entities should, where possible, include supporting references from secondary sources.

Properties

Properties are used to create statements, or claims, about entities with corresponding values. These statements can express relationships between entities--for example, Molly Guldemond (Q28549437) is a sibling of Ryan Guldemond (Q18155774); Henry Rollins (Q318509) was born in Washington D.C. (Q61)--or statements of fact that have a chronological or quantitative nature, such as the date an event was held, the population of a city at a given point in time, or the geographical coordinates of a place.

Properties are, in fact, also entities, because they are concepts, and their attributes can therefore be described by properties. In this way, Wikidata is self-documenting.

For example, click on the property link "instance of" (the property label on the left-hand side of the page, not the value on the right-hand side). https://www.wikidata.org/entity/P31 redirects to https://www.wikidata.org/wiki/Property:P31 for a human-friendly view of the properties of the "instance of" attribute.

Instance of (P31)

In a wonderful example of tautology or recursion or recursive tautology, the first property shown for the "instance of" property is its "instance of" property, which tells us that the instance of property is an instance of both the "Wikidata property" (Q18616576) and the "Wikidata property for the relationship of the element to its class" (Q28326730).

The "Wikidata property" is in turn an instance of "abstract object" (Q7184903), and if you explore that, you'll find out that "abstract object" is not an instance of anything, but instead is a "subclass of" (P279) both:

My undergraduate degree includes a major in philosophy, and this is about as far up as I want to follow things. But we can and should talk about the difference between an "instance of" something, and a "subclass of" something, because that will actually help you when you are editing wikidata.

"Instance of" vs. "Subclass of"

An instance is an example of a given class of some thing. For example, my Jazz Bass Special is an instance of the class of instruments called "electric bass guitars" (Q46185). If you have an electric bass guitar too, then that's another instance. We can compare them and differentiate them based on their properties: manufacturer, country of manufacture, colour, number of strings, neck length, fretted or fretless, wood, serial number, etc.

But as a class of instruments, an electric bass guitar is sufficiently different from a guitar that it merits its own class. To simplify the representation of every possible kind of entity in the world, Wikidata relies heavily on its class hierarchy to generate specific classes from more generic classes.

In the case of "electric bass guitar", it is a subclass of three separate classes:

Each of these are subclasses of musical instrument (Q34379), albeit with some levels in between.

To view the class hierarchy, external tools like Wikidata Class Browser or SQID can help; but per Spitz[1], the user experience of the generic tools for trying to select an appropriate "instance of" property leaves a fair bit of room for improvement.

Music genres

Let's browse the class hierarchy starting at music genre:

https://tools.wmflabs.org/sqid/#/view?id=Q188451

Oh that's interesting. There are a handful of subclasses of music genre and about 1,200 instances. "Beatdown hardcore" anyone?

Labels, descriptions, and multilingual support

So far, you've mostly seen English labels as a result of the relative dominance of that language. But what happens if there is no English label for a given entity? Let's take a look at the guitar class:

guitar (Q6607)

When Wikidata does not have a label in the requested language, it falls back to the Q or P opaque identifier. If you can understand another language that does have a label and/or description, and have a decent understanding of the subject matter, this is a good opportunity for you to contribute by providing an appropriate label and description in your language!

Editing a label and description

The French label for Q1798603 is "instrument à cordes" and its description is "Instrument de musique qui permet de tonifier une ou plusieurs chaînes qui sont étirés entre deux points".

A reasonable translation of those might be "stringed instrument: A musical instrument that generates tones by one or more strings stretched between two points". So let's make that happen.

Click the Edit button on the top right corner of the page, beside the description field.

You can immediately edit the label, description, and alias (also known as) fields for this entity in a variety of languages. The defaults appear to be English, French, Italian, and German, but you can change this for your account under the Preferences section.

No reference is needed to support your changes to labels, descriptions, or aliases. It's an easy way to contribute, and it's very valuable!

Label guidelines

Per Help:Label, labels should reflect the commonly used means of referring to a given entity in a given language--often matching a corresponding Wikipedia article title. Uniqueness, however, is provided by an entity's opaque identifier (Q### or P###) and therefore Wikidata labels do not have to be unique, unlike Wikipedia article titles. The description field helps resolve ambiguities for humans trying to decide between which label to use.

For example, the Northern Lights Festival Boréal concert is held in Bell Park in Sudbury, Ontario, Canada. A search for "Bell Park" turns up many results, some of which are not qualified by any descriptions. Luckily we can easily distinguish the entity we want (Q4883225) via its description "large municipal park in Greater Sudbury, Ontario, Canada".

Description guidelines

Quoting Help:Description:

Alias guidelines

Aliases should be provided for alternate forms of a label; for example, to provide the scientific name of a plant, or less common usages. These provide alternate access points for a given entity.

For example, "double bass" is the most common label used for Q80019, but there are currently nine English aliases listed, including "bass viol" and "bull fiddle".

Statements

Let's examine some of the statements for an interesting entity like Buffy Sainte-Marie (Q467027).

instance of (P31): human (Q5)

Well that makes sense :) Every entity shows its instance of and subclass of statements at the very top so you can quickly place the entity in its appropriate context.

genre (P136)

Buffy Sainte-Marie's wide range as a musician provides an excellent example of how a single property can have many associated values.

image (P18)

Must link to an image that is hosted on Wikimedia Commons, a collection of media files that are freely licensed (https://commons.wikimedia.org/wiki/Commons:Licensing) with the goal that the media can be reused by anyone at anytime, anywhere.

date of birth (P569)

This is one of the first statements that is not a link to another Wikidata entity. image was special in that it linked to a Wikimedia Commons ID, but we've still effectively been working with a controlled vocabulary.

Go ahead and click "edit" for date of birth: you will see that a widget pops up to help you format the date in a machine readable way. You can write in "February 20, 1941" or "1941-02-20" and the widget will reformat the date as "20 February 1941" to show that it understands your entry.

spouse (P26)

This property is interesting mostly because it demonstrates the use of qualifiers for statements. In this case, the statement itself has start time (P580) and an end time (P582) properties associated with it, enabling humans and machines to understand the timeline of what may be a series of relationships between entities.

As with the date of birth property, the widget formats the date data appropriately.

Statement qualifiers

Under the long list of values for award received in Buffy Sainte-Marie's account, note that most of the awards have a point in time (P585) qualifier denoting when the award was received by the entity. However, her Governor General’s Performing Arts Award is missing this qualifier.

We can find the award date in the Toronto Star article at https://www.thestar.com/entertainment/music/2010/04/29/buffy_saintemarie_to_get_governor_generals_award.html, so let's add this data (2010).

  1. Click edit beside the statement to which we want to add the qualifier. The interface changes to show us edit mode, including a new add qualifier option.
  2. Click add qualifier and a new qualifier entry box appears, with your cursor already in the text entry box.
  3. Enter "point in time" in the text entry box; autocomplete will let you press the down arrow to select the right entry after you have entered the first few letters, and then you can press tab to move to the date entry widget.
  4. Enter "2010" for the date and click save. Wikidata saves the change to the entity, and the interface returns to display mode.

References

Many statements have no supporting references, or in some cases have a reference, but it is something minimal like "imported from Russian Wikipedia". Much of Wikidata's content was initially bootstrapped by parsing Wikipedia for machine-readable content. If you have time, adding supporting references from authoritative sources to existing statement can be a great contribution.

Stated in

Reference URL

Title

Publication date

Retrieved

DuplicateReferences gadget

After you've manually added the same reference a dozen times to support each of a dozen different statements for a given entity, you might get tired of the effort and think "There has to be a better way". It turns out there is; under Preferences -> Gadgets for your account, check the DuplicateReferences box and hit save.

Now when you edit an entity, a copy link is added to each existing reference. Click copy and a new insert reference link will be added to each existing statement. Clicking insert reference will immediately add a copy of the reference to that statement.

Suggested workflow: add all of your statements, then add the initial reference. Once you save the reference, you can copy it and insert it into all of the corresponding statements.

Authority control

Wikidata does not pretend to stand alone; instead, it acknowledges that there are many other sources of structured data via links to those external systems (Help:Statements/Guidelines_for_external_relationships).

Properties that have been created as external identifiers enable you to just add the external ID, and Wikidata will generate the full URL to link to that external entity.

For example, the Authority control section in Buffy Sainte-Marie's Wikipedia page contains the {{Authority control}} macro, which automatically pulls in a subset of the identifiers from the Wikidata page that links to that Wikipedia page.

Exact match vs. External IDs

Exact match (P2888) is "used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably", while external IDs link provide links to other systems such as MusicBrainz

Music genres

There is an active Wikipedia music genres task force working on improving the organization of music genres in Wikipedia. It's not clear that there is a parallel effort in Wikidata. This might be a good way for someone passionate about music genres to get constructively involved in a deep way with Wikidata!

Meanwhile, we can use the many genres Wikidata already has in our statements.

Representing musicians

genre (P136) captures what music genre(s) (Q188451) the band, individual, or event belongs to / performs / composes (SQID / Tree)

occupation (P106) = musician (Q639669) or one of its subclasses (SQID / Tree)

instrument (P1303) = musical instrument (Q34379) or one of its subclasses (SQID / Tree)

Representing bands

instance of = band (Q215380)

Band members are represented via has part (P527), and you can qualify each statement with start time (P580) and end time (P582) to reflect when members joined or left the band.

inception (P571) is the when the band began

Representing music festivals

Instance of both:

For example, Northern Lights Festival Boréal (Q7058657) is the generic entity, and Northern Lights Festival Boréal 2016 (Q29875240) is a specific instance.

Add an edition number (P393) property to indicate if this is the 10th, 20th, etc edition of the festival.

Each specific instance can link to its previous and following event in the overall series using the follows (P155) and followed by (P156) properties.

Use director (P57) to identify the artistic director.

Use multiple participant statements to identify which musicians or bands participated in the festival. If you know when the participant played, for example for headliners, you can add point in time (P585) qualifiers for those statements.

You may need to create new Wikidata items for some of the participants so that you can link to them. For example, in https://www.sudbury.com/lifestyle/its-here-northern-lights-festival-back-for-2015-256336 there are two rich paragraphs listing the lineup, but only a subset of these artists are currently in Wikidata (those in bold are not):

Also on the bill for this year's festival are Lee Harvey Osmond, Quique Escamilla, J.P. Cormier, Five Alarm Funk, Adonis Puentes, The Bombadils, Les Poules a Colin, Pistol George Warren, House of David Gang, Jayme Stone's Alan Lomax Project, Night Terrors, Alfie Smith, Bustamento and Drumhand.

Northern Lights also features many local acts, including Dayv Poulin, Edouard Landry, Billy John & The Irish Wake, Sean Barrette, Tuba Boy, The Wild Geese, Hello Holiday and Lisa Marie Naponse.


[1] Spitz, A., Dixit, V., Richter, L., Gertz, M., & Geiß, J. (2016, April). State of the Union: A Data Consumer’s Perspective on Wikidata and Its Properties for the Classification and Resolution of Entities. In Wiki Workshop at ICWSM.