Investigating Wikibase

TIB research report

Dan Scott <https://dscott.ca/#i>

Associate Librarian, Laurentian University

2018-07-24

Wikibase

  • Open source (GPL v2 or later) project on which Wikidata is built
  • It scales!

Why Wikibase?

  • It exists now
  • Designed for collaboratively building and maintaining a dataset
  • Active development community
    • Extensions and development efforts might be embraced
    • Skills learned may be reused in other projects
  • Good platform for rapid prototyping

Core features built-in

  • Rich data model
  • Basic item, property, and claim editor interface
  • Per-page discussion and revision history
  • Authentication and authorization controls
  • Multilingual labels, descriptions, search
  • Self-hosting ontology
  • Content negotiation for per-item or bulk RDF dumps
  • MediaWiki API with bindings in Python, Java, JavaScript, etc
  • Integration with OpenRefine 3.0

SPARQL endpoint

  • Built on Blazegraph
  • Autocomplete query editor
  • Various visualizations (maps, graphs, etc)
  • Loosely coupled via RDF updates

11 built-in datatypes

External identifier Geographic coordinates
Geographic shape Item
Monolingual text Point in time
Property Quantity
String URL
Media file* Tabular data*

* maintained in Wikimedia Commons

Community

  • Channels on IRC, mailing lists, Wikidata itself, Phabricator (project management software), Discourse
  • Docker images via docker-compose (8 images in all)

Minimal ORKG infrastructure

  1. A data model for semantically representing scholarly communication
  2. A scalable graph-storage backend infrastructure […] exposing a comprehensive API
  3. User interface widgets and components for collaborative authoring and curation of the graph
  4. Semi-automated semantic integration, search, extraction and recommendation services

Auer, Sören et al (2018). Towards an Open Research Knowledge Graph. https://doi.org/10.5281/zenodo.1157185

Wikibase for ORKG?

  1. Data model lacks RDF for every revision
    • Updated RDF is loaded every ten seconds; retain every change with provenance?
  2. Backend is scalable, generates RDF, has an API
  3. Authoring UI satisfies some use cases
  4. SPARQL endpoint arguably supports semantic integration, search, and extraction
    • Recommendation service (beyond autocomplete) needs work

Other Wikibase concerns

  • Limited support for expressing RDFS or OWL ontologies
    • An ontology could be hardcoded into the RDF output
    • Or support for owl:equivalentProperty, etc could follow canonicalUriProperty approach
  • Limited adoption by independent projects
    • A showcase opportunity for ORKG?

ORKG observations

Premises

  • Current focus is on the technology stack
  • It is a huge amount of work to do to design, build, test, and document a full stack
  • Prototypes would help validate design decisions and guide implementation efforts

Open issues? (1/2)

  • Granularity of statements to be added to the graph
  • Just URIs for publications, or more?
    • If more, reuse an existing publication ontology or mint a new one?
  • 11 relations in ORKG, or CITO's roughly 40 citation types, etc
  • Vocabulary for classifying research areas: ANZSRC? (for example, Computer vision)

Open issues? (2/2)

  • Data entry techniques need different prototypes:
    • Form linked from conference / article submission
    • Retrospective conversion
    • Automated vs. manual approaches
    • Curation: creation vs. editing vs. administration
  • Output prototypes
    • Visualizations of research domains
    • Research area summaries and trends
    • Generated systematic reviews