Work plan and existing R&D

Work plan

Goal

Demonstrate the use of Computational Publishing (Jupyter Notebooks) for publishing from heritage cultural collections using Wikidata and Wikibase, and Linked Open Data: The example is an exhibition catalogue.

Challenges (New additions)

Zero install setup: Using VMs and runtime environments in the browser - Google CoLab and GitHub Codespace, etc.
High quality automated multi-format typesetting - PDFs using LaTeX, later will move to CSS Page Media
Moving from Wikidata to Wikibase - Wikibase has been used but not extensively
New Notebooks:
Text from the HTML DOM via Wikibase
Data analysis using neo4j
Focus on managing the publication lifecycle

Objectives

Publishing media of a digital heritage collection fróm Wikidata and/or Wikibase
Use a SPARQL Query to script what media and data is selected
Demo editing the SPARQL Query to sample a different collections data
Authoring using a Jupyter Notebook in Google Colab and GitHub Codespace - or equivalent
Show multi-format outputting
Show a data analysis featured Jupyter Notebook

Tasks

Make the four Jupyter Notebooks
Copyright Page (Impressum) - this needs expanding to get more fields
Essay - how to collect an HTMlL doc based on a search and the DOM
Painting Collection - move from Wikidata to Wikibase
Data analysis - choose a popular DH option, neo4js based
Use Google Colab and GitHub Codespace or equivalent
Currently the system runs locally
Using a runtime environment like Google CoLab means no installations have to be made by the user. NB: There are many VM options, CoLab is only for demonstration purposes.
Style outputs
Only default styles have been used to date
Styles are needed for: Web - Bootstrap PDF - LaTeX ePub - css Docx - Word template
Digital publishing good practice
Packaging: Web Publication Manifest, JATS, validation, etc
Archiving and LTP
Open science practice: PIDs, deposit, software citation
Workflow for publishers, institutions, scholars
Demo, presentation slide deck, future work plan outline
Review complete list of notebooks, guides and classes
Review CPS proposal based on prototyping

Existing R&D

Prototypes, documentation, classes, publications, presentations

Existing prototypes have been built using the Quarto framework which is a wrapper for Jupyter Notebooks and provides multi-format outputting. The Notebooks include SPARQL Queries made in Wikidata/base that then have Python written around them to process outputs. Prototypes were made between the end of 2022 and August 2023 in cooperation with the COPIM research project, which ended in early 2023. Related prototypes have continued with the #semanticClimate project, but currently these are not immediately relevant to the demo as they relate to corpora text retrieval, TDM, semantic annotation, automated literature reviews, knowledge graphs, and ML/NLP. Below is a sample of prototypes and class guides.

Prototype Notebooks

(Nov 2022 onwards)

Teaching Quarto and Wikidata

Template for use in class teaching use of Quarto and Wikidata, Sept 28 2023 (This example is used as the basis for the CPS demo)

Repo URL: https://github.com/NFDI4Culture/catalogue-003
Notebooks: Publication imprint collection from Thoth.pub API; Painting media and data from Wikidata.
Outputs: Web, PDF, ePub, DOCX
Tech stack: Quarto, Jupyter Notebooks, Python, SPARQL query, VSCode, GitHub, GitHub Pages, Wikidata, Thoth.pub API, Zenodo.

Computational Publishing for Collections

Computational Publication: Computational Publishing for Collections - ADA CP Prototype #1 - Nov 22

Repo URL: https://github.com/NFDI4Culture/cp4c/

Notebooks: Image from linked open data API (Wikidata); Linked open data query from SPARQL (wikibase); 3D model with annotations (Wikibase, Semantic Kompakt); QR Code generation; Embedded video (TIB AV Portal iframe); Query ORCID for works authored by a person; Linked open data API testing (Wikibase). Outputs: Web, PDF, ePub, DOCX

ScholarLed - Open Access Presses catalogue

ScholarLed - Open Access Presses catalogue: An OA publisher groups book catalogue

Made by Simon Bowie of COPIM for the publisher group ScholarLed. Based on ideas developed during the NFDI4C and COPIM partnership. The variation here was to have a CI process generate a new website one a day. This went into live production in 2023.

https://github.com/SimonXIX/scholarled_catalogue

Notebook Library

Notebook Library (July 2023)

Test benchmark Notebooks for use in classes and for demonstration purposes.

Thoth API Notebook - publishers book catalogue: https://nfdi4culture.github.io/ada-book-notebook/all_press.html
Painting Notebook - Retrieves painting images and data from Wikidata: https://github.com/NFDI4Culture/ada-painting-notebook
Quart benchmark Notebook - this is to test that Quarto functions in VSCode: https://github.com/NFDI4Culture/ada-benchmark-notebook

Guides and classes

FSCI 2023: Publishing from Collections

Class Guide: FSCI 2023's E08 Publishing from Collections Using Linked Open Data Source and Computational Publishing Pipelines. 2023-07-28 v1.1

https://nfdi4culture.github.io/FSCI-Class-Publishing-from-Collections/

OpenKnowledge23: Automating Exhibition Catalogue Creation

Automating Exhibition Catalogue Creation - A Guide. 2023-03-28 v1.1

https://nfdi4culture.github.io/automating-exhibition-catalogue-creation-guide/

OpenKnowledge23 class: https://de.wikiversity.org/wiki/OpenKnowledge23

Tech stack

Objective is to have a tech stack that is open-source, adheres to digital sovereignty guidelines, and is ethically inline with democratic values, UN Charter of Human Rights, and Open Science principles, practice, and values (UNESCO).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly