Skip to content

Work plan and existing R&D

Simon Worthington edited this page Mar 25, 2024 · 7 revisions

Work plan

Goal

  • Demonstrate the use of Computational Publishing (Jupyter Notebooks) for publishing from heritage cultural collections using Wikidata and Wikibase, and Linked Open Data: The example is an exhibition catalogue.

Challenges (New additions)

  • Zero install setup: Using VMs and runtime environments in the browser - Google CoLab and GitHub Codespace, etc.
  • High quality automated multi-format typesetting - PDFs using LaTeX, later will move to CSS Page Media
  • Moving from Wikidata to Wikibase - Wikibase has been used but not extensively
  • New Notebooks:
  • Text from the HTML DOM via Wikibase
  • Data analysis using neo4j
  • Focus on managing the publication lifecycle

Objectives

  • Publishing media of a digital heritage collection fróm Wikidata and/or Wikibase
  • Use a SPARQL Query to script what media and data is selected
  • Demo editing the SPARQL Query to sample a different collections data
  • Authoring using a Jupyter Notebook in Google Colab and GitHub Codespace - or equivalent
  • Show multi-format outputting
  • Show a data analysis featured Jupyter Notebook

Tasks

  • Make the four Jupyter Notebooks
  • Copyright Page (Impressum) - this needs expanding to get more fields
  • Essay - how to collect an HTMlL doc based on a search and the DOM
  • Painting Collection - move from Wikidata to Wikibase
  • Data analysis - choose a popular DH option, neo4js based
  • Use Google Colab and GitHub Codespace or equivalent
  • Currently the system runs locally
  • Using a runtime environment like Google CoLab means no installations have to be made by the user. NB: There are many VM options, CoLab is only for demonstration purposes.
  • Style outputs
  • Only default styles have been used to date
  • Styles are needed for: Web - Bootstrap PDF - LaTeX ePub - css Docx - Word template
  • Digital publishing good practice
  • Packaging: Web Publication Manifest, JATS, validation, etc
  • Archiving and LTP
  • Open science practice: PIDs, deposit, software citation
  • Workflow for publishers, institutions, scholars
  • Demo, presentation slide deck, future work plan outline
  • Review complete list of notebooks, guides and classes
  • Review CPS proposal based on prototyping

Existing R&D

Prototypes, documentation, classes, publications, presentations

Existing prototypes have been built using the Quarto framework which is a wrapper for Jupyter Notebooks and provides multi-format outputting. The Notebooks include SPARQL Queries made in Wikidata/base that then have Python written around them to process outputs. Prototypes were made between the end of 2022 and August 2023 in cooperation with the COPIM research project, which ended in early 2023. Related prototypes have continued with the #semanticClimate project, but currently these are not immediately relevant to the demo as they relate to corpora text retrieval, TDM, semantic annotation, automated literature reviews, knowledge graphs, and ML/NLP. Below is a sample of prototypes and class guides.

Prototype Notebooks

(Nov 2022 onwards)

Teaching Quarto and Wikidata

Template for use in class teaching use of Quarto and Wikidata, Sept 28 2023 (This example is used as the basis for the CPS demo)

  • Repo URL: https://github.com/NFDI4Culture/catalogue-003
  • Notebooks: Publication imprint collection from Thoth.pub API; Painting media and data from Wikidata.
  • Outputs: Web, PDF, ePub, DOCX
  • Tech stack: Quarto, Jupyter Notebooks, Python, SPARQL query, VSCode, GitHub, GitHub Pages, Wikidata, Thoth.pub API, Zenodo.

Computational Publishing for Collections

Computational Publication: Computational Publishing for Collections - ADA CP Prototype #1 - Nov 22

Repo URL: https://github.com/NFDI4Culture/cp4c/

Notebooks: Image from linked open data API (Wikidata); Linked open data query from SPARQL (wikibase); 3D model with annotations (Wikibase, Semantic Kompakt); QR Code generation; Embedded video (TIB AV Portal iframe); Query ORCID for works authored by a person; Linked open data API testing (Wikibase). Outputs: Web, PDF, ePub, DOCX

ScholarLed - Open Access Presses catalogue

ScholarLed - Open Access Presses catalogue: An OA publisher groups book catalogue

Made by Simon Bowie of COPIM for the publisher group ScholarLed. Based on ideas developed during the NFDI4C and COPIM partnership. The variation here was to have a CI process generate a new website one a day. This went into live production in 2023.

https://github.com/SimonXIX/scholarled_catalogue

Notebook Library

Notebook Library (July 2023)

Test benchmark Notebooks for use in classes and for demonstration purposes.

Guides and classes

FSCI 2023: Publishing from Collections

Class Guide: FSCI 2023's E08 Publishing from Collections Using Linked Open Data Source and Computational Publishing Pipelines. 2023-07-28 v1.1

https://nfdi4culture.github.io/FSCI-Class-Publishing-from-Collections/

OpenKnowledge23: Automating Exhibition Catalogue Creation

Automating Exhibition Catalogue Creation - A Guide. 2023-03-28 v1.1

https://nfdi4culture.github.io/automating-exhibition-catalogue-creation-guide/

OpenKnowledge23 class: https://de.wikiversity.org/wiki/OpenKnowledge23

Tech stack

Objective is to have a tech stack that is open-source, adheres to digital sovereignty guidelines, and is ethically inline with democratic values, UN Charter of Human Rights, and Open Science principles, practice, and values (UNESCO).