DaMaLOS 2023 - Workshop proceedings

DaMaLOS 2023 took place on the 29th of May 2023 07:30 to 11:45 CEST @ ESWC

Note: Proceedings are Open Access and available at the PUBLISSO Fachrepositorium DaMaLOS 2023 collection

Editorial note DOI:10.4126/FRL01-006444982
From Floppy Disks to 5-Star LOD: FAIR Research Infrastructure for NFDI4Culture DOI:10.4126/FRL01-006444986 presented by Tabea Tietz (FIZ Karlsruhe)
Use cases and benefits of persistent identifiers for dataset elements DOI:10.4126/FRL01-006444991 presented by Janete Saldanha Bach (GESIS – Leibniz Institute for the Social Sciences)
Ya2ro: A tool for creating Research Objects from minimum metadata DOI:10.4126/FRL01-006444984 presented by Daniel Garijo (Universidad Politécnica de Madrid)
Connecting data repositories and DMP tools using maDMPs DOI:10.4126/FRL01-006444985 presented by Tomasz Miksa (TU Wien & SBA Research)
Tailoring Documentation for Stakeholder Groups on the Example of a Dashboard DOI:10.4126/FRL01-006444987 presented by Jan Bernoth (University of Potsdam)
A metadata schema for machine-actionable Software Management Plans DOI:10.4126/FRL01-006444988 presented by Leyla Jael Castro (ZB MED)
Modelling three dimensions of provenance for wet-lab experiments: prospective, retrospective, and evolution DOI:10.4126/FRL01-006444990 presented by Sascha Genehr (University of Rostock)
Making Metadata FAIR Using Large Language Models DOI:10.4126/FRL01-006444995 presented by Sowmya S Sundaram (Stanford University)
About versioning ontologies or any digital object with clear semantics DOI:10.4126/FRL01-006444994 presented by María Poveda-Villalón (Universidad Politécnica de Madrid)
FAIR Research Object Assessment: A landscape analysis DOI:10.4126/FRL01-006444989 presented by Esteban Gonzalez (Universidad Politécnica de Madrid)
FAIR Data Management Workflow for MRI Data DOI:10.4126/FRL01-006444992 presented by Nicolas Blumenröhr (Karlsruhe Institute of Technology)
An Improved Questionnaire for FAIRness Characterization DOI:10.4126/FRL01-006444993 presented by Leonardo Guerreiro Azevedo (IBM Research)
Enhancing Reproducibility and Trustability in Intelligent Asset Management Systems for the Railway Domain DOI:10.4126/FRL01-006444996 presented by Mario Scrocca (Cefriel)

Book of abstracts

From Floppy Disks to 5-Star LOD: FAIR Research Infrastructure for NFDI4Culture

Authors: Tabea Tietz (0000-0002-1648-1684), Oleksandra Bruns (0000-0002-8501-6700), Linnaea Söhn (0000-0001-8341-1187), Julia Tolksdorf (0000-0002-0495-5897), Etienne Posthumus (0000-0002-0006-7542), Jonatan Jalle Steller (0000-0002-5101-5275), Heike Fliegl (0000-0002-7541-115X), Ebrahim Norouzi, Jörg Waitelonis (0000-0001-7192-7143), Torsten Schrade (0000-0002-0953-2818), Harald Sack (0000-0001-7069-9804)

Abstract: NFDI4Culture is establishing an infrastructure for research data on material and immaterial cultural heritage in the context of the German National Research Data Infrastructure (NFDI) in compliance with the FAIR principles. The NFDI4Culture Knowledge Graph is developed and integrated with the Culture Information Portal to aggregate diverse and isolated data from the culture research landscape and thereby increase the discoverability, interoperability and reusability of cultural heritage data. This paper presents the research data management strategy in the long-term project NFDI4Culture, which combines a CMS and a Knowledge Graph-based infrastructure to enable an intuitive and meaningful interaction with research resources in the cultural heritage domain.

Keywords: research data - knowledge graphs - Infrastructure - cultural heritage

Use cases and benefits of persistent identifiers for dataset elements

Authors: Janete Saldanha Bach [0000-0001-9011-5837], Claus-Peter Klas [0000-0002-7794-7716], Peter Mutschke [0000-0003-3517-8071]

Abstract: In many scientific disciplines, Persistent Identifiers (PIDs) are commonly available only at the study or dataset level but not at the level of the inline data objects that are usually used by researchers, making it difficult to track and cite them. This paper focuses on registering PIDs also for finer-grained elements of a dataset representing the primary entities of research, such as survey variables in the Social Sciences, beyond the traditional approach of assigning PIDs to entire datasets. The paper highlights the benefits of this approach for researchers and research data centers and discusses four use cases from the consortium KonsortSWD of the German National Research Data Infrastructure (NFDI), addressing how project partners adopt the idea of having PIDs for dataset elements below study level according to their needs, data types and available services. The use cases address requirements from the perspective of empirical Social Sciences, targeting different types of dataset elements that should have a PID, such as individual survey variables, information bundles (variable groups), and qualitative data contained in observation recordings, interviews, and transcriptions. The paper concludes with functionalities that require persistent identifiers to be implemented, such as automated access mechanisms to access elements in the dataset directly.

Keywords: Social Sciences survey variables - Persistent Identifiers - PIDs - Research data citation - Research data services - technical infrastructure.

Ya2ro: A tool for creating Research Objects from minimum metadata

Authors: Antonia Floriana Pavel, Daniel Garijo [0000−0003−0454−7145]

Abstract: Research Objects (ROs) have been proposed as a packaging mechanism to aggregate research outputs of scientific investigations and capture their context and metadata in a machine-readable manner. However, creating ROs (and collecting the respective metadata of their constituent resources) is still a time-consuming task. In this demo we present ya2ro, a tool designed to ease the creation of Research Objects following the RO-Crate specification. Given an input file with external resources available on the Web (datasets, software, publications and people), ya2ro will retrieve their metadata descriptions (if available), creating an aggregated RO-Crate available both in human-readable manner (HTML) and machine-readable manner (JSON-LD).

Keywords: Research Object - Metadata - Dataset - Software

Connecting data repositories and DMP tools using maDMPs

Authors: Tomasz Miksa [0000−0002−4929−7875], Sotirios Tsepelakis [0000−0003−0644−4174], David Eckhard [0000−0001−6642−6846], Max Moser

Abstract: The paper presents an integration of InvenioRDM-based data repository with DAMAP-based tool for data management plans. We use machine-actionable data management plans to orchestrate the integration and to automate the process of DMP creation. The larger goal is to establish a common interface for communication between DMP tools and data repositories.

Keywords: maDMPs - InvenioRDM - data management - automation

Tailoring Documentation for Stakeholder Groups on the Example of a Dashboard

Authors: Jan Bernoth [0000−0002−4127−0053]

Abstract: Stakeholders play a crucial role in requirements engineering and can be considered an immediate target audience for documentation. This contribution presents a case study of a dashboard with its backend to demonstrate which documents can be created for documentation purposes and how they can partially comply with FAIR principles.

Keywords: Software Documentation - Software Artifacts - Documentation - Research Software Engineering - Stakeholder - FAIR

A metadata schema for machine-actionable Software Management Plans

Authors: Olga Giraldo [0000-0003-2978-8922], Lukas Geist [0000-0002-2910-7982], Nelson Quiñones, Dhwani Solanki, Renato Alves, Dimitrios Bampalikis, José M. Fernández [0000-0002-4806-5140], Eva Martin del Pico, Fotis Psomopoulos [0000-0002-0222-4273], Allegra Via [0000-0002-3398-5462], Dietrich Rebholz-Schuhmann [0000-0002-1018-0370], Leyla Jael Castro [0000-0003-3986-0510]

Abstract: Data-driven research requires handling data together with the software that is used to collect, transform, and create such data. Data Management Plans have emerged as a systematic way to record the data management lifecycle for data corresponding to a research project. Similar to DMPs, Software Management Plans (SMPs) follow the research software management lifecycle, becoming a complement of DMPs. Initially, both DMPs and SMPs were conceived as text-based documents, sometimes guided by a set of questions targeting key points related to the corresponding lifecycle. Machine-actionable DMPs improve text-based DMPs by adding a semantic layer representing the most common elements relevant to DMPs, from datasets to funders. Here, we use the ELIXIR SMP as a use-case and present a preliminary metadata schema including possible types and properties useful to represent machine-actionable SMPs.

Keywords: Research software - Management Plan - Metadata - machine-actionable

Modelling three dimensions of provenance for wet-lab experiments: prospective, retrospective, and evolution

Authors: Sascha Genehr [0000-0002-1702-6878], Meike Bielfeldt, Max Schröder [0000-0003-1522-494X], Susanne Stählke, Barbara Nebe, Sascha Spors, Frank Krüger [0000-0002-7925-3363]

Abstract: Provenance plays an important role for the documentation of research data. Born-digital documentation simplifies the acquisition of provenance information. In contrast to computational workflows, wet-lab experiments rely on manually created documentation based on prior established research protocols. While deviation from such protocols can easily be described using standard provenance models, the evolution of such protocols is often not included. Consequently, this prevents the effective use of provenance models derived from experiments, following revised protocols. In this article, we introduce the three dimensions of provenance for wet-lab in-vitro experiment and discuss how ontologies can be utilized for the unambiguous representation. Based on a use case from a cell-biological electrical stimulation experiment, we describe the three dimensions and provide a model that closes the described gap between research documentation’s provenance models resulting from different protocol versions.

Keywords: Provenance - FAIRication - Protocols

Making Metadata More FAIR Using Large Language Models

Authors: Sowmya S Sundaram - 0000-0002-0086-7582, Mark A. Musen - 0000-0003-3325-793X

Abstract: With the global increase in experimental data artifacts, harnessing them in a unified fashion leads to a major stumbling block - bad metadata. To bridge this gap, this work presents a Natural Language Processing (NLP) informed application, called FAIRMetaText, that compares metadata. Specifically, FAIRMetaText analyzes the natural language descriptions of metadata and provides a mathematical similarity measure between two terms. This measure can then be utilized for analyzing varied metadata, by suggesting terms for compliance or grouping similar terms for identification of replaceable terms. The efficacy of the algorithm is presented qualitatively and quantitatively on publicly available research artifacts and demonstrates large gains across metadata related tasks through an in-depth study of a wide variety of Large Language Models (LLMs). This software can drastically reduce the human effort in sifting through various natural language metadata while employing several experimental datasets on the same topic.

Keywords: Metadata - FAIRification - NLP - LLM - GPT

About versioning ontologies or any digital objects with clear semantics

Authors: Clement Jonquet [0000-0002-2404-1582], María Poveda-Villalón [0000-0003-3587-0367]

Abstract: The article discusses the process of versioning for ontologies and semantic artefacts developed using Semantic Web technologies. We describe methods for en-coding versioning and other relevant information in metadata properties and we illustrate with examples from the MOD2.0 specification. Building on our experiences with the AgroPortal ontology repository and the Linked Open Vocabularies, we raise several questions, such as which metadata properties to use, how metadata values should be coordinated, and what stays the same over versions, and what should change. We propose recommendations for better versioning ontologies with clear semantics –identifiers, descriptions, status, dates, links– and suggest that the recommendations can be generalized to any digital objects that need to be versioned and semantically described.

Keywords: versioning - ontologies - semantic artefacts - metadata

FAIR Research Object Assessment: A landscape analysis

Authors: Esteban González (0000-0003-4112-6825), Daniel Garijo (0000-0003-0454-7145), Oscar Corcho (0000-0002-9260-0753), Raul Palma (0000-0003-4289-4922), Malgorzata Wolniewicz (0000-0003-2388-0744), Aron Rynkiewicz (0000-0002-0528-7544)

Abstract: Research Objects (ROs) are becoming a popular means to capture the context and research artefacts associated with a research investigation in both human-readable and machine-readable formats. However, it is unclear how well ROs themselves adhere to the FAIR (findable, accessible, interoperable, and reusable) principles. In this work, we describe a comprehensive analysis of the FAIR assessment of more than 2500 ROs across multiple disciplines. Our work integrates FAIROs, our existing RO evaluation service, in the ROHub platform. We discuss the challenges of calculating the FAIR assessment of aggregations of resources, and how we supplement the FAIROs tests with information from the RO-Crate descriptor file generated by ROHub.

Keywords: Research Object - Metadata - FAIR analysis

FAIR Data Management Workflow for MRI Data

Authors: Nicolas Blumenröhr [0009−0007−0235−4995], Neil MacKinnon [0000−0002−9362−4845], Rossella Aversa [0000−0003−2534−0063]

Abstract: We present a workflow to improve the management of Magnetic Resonance Imaging data and to increase its compliance with the FAIR principles. This involves using the JSON Metadata Mapping Tool we have developed to map metadata from a domain-specific file format to a JSON schema based format, and storing the data and the mapped metadata in repositories. Some steps in the workflow are automated, while others require human intervention, facilitated by Graphical User Interfaces for each service. We assessed the compliance of our curated data to the FAIR principles, both manually and using the F-UJI tool. We obtain a FAIR assessment score of 79% for both datasets, which is the highest compared to similar ones in the same field. According to these results, we conclude that the workflow we have implemented can provide a significant improvement towards FAIR data management.

Keywords: FAIR Data Management - Metadata Schema - Metadata Mapping - Magnetic Resonance Imaging

An Improved Questionnaire for FAIR Characterization

Authors: Leonardo Guerreiro Azevedo [0000-0002-2109-1285], Julio Tesolin [0000-0002-0240-4506], Gabriel Banaggia [0000-0001-5555-894X], Renato Cerqueira [0000-0003-2829-7857]

Abstract: The FAIR principles guidelines aim to enhance the discovery and usage of digital objects by humans and computational agents. They are formulated at a high level and, as such, are interpreted and implemented in different ways by communities of practice. Practical approaches outlining FAIR-related characteristics of digital objects are few and far between. This paper analyzes the FAIR principles while considering distinct proposed metrics, questionnaires, and tools for manual, automated, and semi-automated FAIR assessment. Here, we present an improved questionnaire for the FAIR characterization of digital objects. Our goal is not to give a FAIRness grade for digital objects, but to outline their properties related to the FAIR principles, at any point of their data life cycle. Different communities can use the questionnaire to characterize their assets. It is designed from the outset with the additional objective of supporting the creation of bespoke metrics and assisting in automated assessment in the future. We evaluated the questionnaire by applying it to characterize data digital objects from two data repositories of materials science, the ones from Materials Cloud and PubChem.

Keywords: FAIR - FAIRness Characterization - FAIR Questionnaire - Data Management

Enhancing Reproducibility and Trustability in Intelligent Asset Management Systems for the Railway Domain

Authors: Mario Scrocca [0000−0002−8235−7331], Ilaria Baroni[0000−0001−5791−8427], Alessio Carenini [0000−0003−1948−807X], Irene Celino [0000−0001−9962−7193]

Abstract: The definition of an Intelligent Asset Management System (IAMS) requires integrating different digital artifacts to support the data-driven and proactive management of physical assets by companies. The implementation of such systems is often customly designed for a specific scenario and does not take into account best practices to enable the sharing and reuse of digital artifacts. Additionally, it is crucial to monitor the involvement of various stakeholders in the development of system functionalities that influence maintenance processes, enabling effective auditing activities. This paper focuses on addressing these challenges and presents the IAMS Integration Support Framework which was designed and demonstrated in the railway domain. The framework aims to enhance reproducibility and trustability in business scenarios within a multi-stakeholder environment.

Keywords: Intelligent maintenance - Reproducibility - Trustability