Investigating Wikibase

TIB research report

Dan Scott <https://dscott.ca/#i>

Associate Librarian, Laurentian University

2018-07-24

Wikibase

Open source (GPL v2 or later) project on which Wikidata is built
It scales!
- 50M items
- 5,000 properties
- 450M claims

Why Wikibase?

It exists now
Designed for collaboratively building and maintaining a dataset
Active development community
- Extensions and development efforts might be embraced
- Skills learned may be reused in other projects
Good platform for rapid prototyping

Core features built-in

Rich data model
Basic item, property, and claim editor interface
Per-page discussion and revision history
Authentication and authorization controls
Multilingual labels, descriptions, search
Self-hosting ontology
Content negotiation for per-item or bulk RDF dumps
MediaWiki API with bindings in Python, Java, JavaScript, etc
Integration with OpenRefine 3.0

SPARQL endpoint

Built on Blazegraph
Autocomplete query editor
Various visualizations (maps, graphs, etc)
Loosely coupled via RDF updates

11 built-in datatypes

External identifier	Geographic coordinates
Geographic shape	Item
Monolingual text	Point in time
Property	Quantity
String	URL
Media file*	Tabular data*

* maintained in Wikimedia Commons

Community

Channels on IRC, mailing lists, Wikidata itself, Phabricator (project management software), Discourse
- Initial response was slow: prep + unknown newbie == limited cycles?
- Weekly Technical Advice IRC Meeting worked
Docker images via docker-compose (8 images in all)
- Images were a release behind, so I updated them

Minimal ORKG infrastructure

A data model for semantically representing scholarly communication
A scalable graph-storage backend infrastructure […] exposing a comprehensive API
User interface widgets and components for collaborative authoring and curation of the graph
Semi-automated semantic integration, search, extraction and recommendation services

Auer, Sören et al (2018). Towards an Open Research Knowledge Graph. https://doi.org/10.5281/zenodo.1157185

Wikibase for ORKG?

Data model lacks RDF for every revision
- Updated RDF is loaded every ten seconds; retain every change with provenance?
Backend is scalable, generates RDF, has an API
Authoring UI satisfies some use cases
SPARQL endpoint arguably supports semantic integration, search, and extraction
- Recommendation service (beyond autocomplete) needs work

Other Wikibase concerns

Limited support for expressing RDFS or OWL ontologies
- An ontology could be hardcoded into the RDF output
- Or support for owl:equivalentProperty, etc could follow canonicalUriProperty approach
Limited adoption by independent projects
- A showcase opportunity for ORKG?

owl:complementOf, owl:onPrperty, owl:DataProperty, owl:ObjectProperty, owl:Restriction, owl:Thing, rdf:type

ORKG observations

Premises

Current focus is on the technology stack
It is a huge amount of work to do to design, build, test, and document a full stack
Prototypes would help validate design decisions and guide implementation efforts

Open issues? (1/2)

Granularity of statements to be added to the graph
Just URIs for publications, or more?
- If more, reuse an existing publication ontology or mint a new one?
11 relations in ORKG, or CITO's roughly 40 citation types, etc
Vocabulary for classifying research areas: ANZSRC? (for example, Computer vision)

Open issues? (2/2)

Data entry techniques need different prototypes:
- Form linked from conference / article submission
- Retrospective conversion
- Automated vs. manual approaches
- Curation: creation vs. editing vs. administration
Output prototypes
- Visualizations of research domains
- Research area summaries and trends
- Generated systematic reviews