Workshop schedule

Tentative Date: 26th May 2024 TBC

Tentative Time: Morning session 09:00 am to 12:30 pm local time (EEST, i.e., UTC+3) TBC

Proceedings: Papers will be Open Access and available at the PUBLISSO Fachrepositorium DaMaLOS 2024 collection

Schedule

Time (EEST) Duration Activity Responsible
09:00 - 09:05 5' Welcome and introduction Leyla Jael Castro
09:05 - 10:00 40' + 15' Q&A Keynote: FAIR for Machine Learning; Building on the Lessons from FAIR Software Fotis Psomopoulos INAB/CERTH
First session: FAIR, Data and Semantics in Action - Chair Sonja Schimmler
10:00 - 10:20 10' + 5' Q&A What’s Cooking in the NFDI4Culture Kitchen? A KG-based Research Data Integration Workflow Oleksandra Bruns, Information Service Engineering, Karlsruhe Institute of Technologie
10:20 - 10:30 5' + 5' Q&A NFDI4Cat - Utilizing Semantic Web Technologies for a Data Infrastructure in Catalysis-Related Sciences Bianca Wentzel, Fraunhofer FOKUS – Institute for Open Communication Systems,
10:30 - 11:00 Coffee break
Second session: FAIR and FAIRness evaluators - Chair Danilo Dessi
11:00 - 11:20 15' + 5' Q&A An Appraisal of Automated Tools for FAIRness Evaluation Leonardo Guerreiro Azevedo, IBM Research - Brazil
11:20 - 11:35 10' + 5' Q&A Advancements and challenges in assessing FAIR Principles: a hybrid approach through manual and automated assessments Janete Saldanha Bach, GESIS – Leibniz Institute for the Social Sciences,
11:35 - 11:55 15' + 5' Q&A FAIR Data Publishing with Apache Maven Claus Stadler, Institute for Applied Informatics
11:55 - 12:10 10' + 5' Q&A Towards FAIR Data in Energy System Research: Challenges and Perspectives Amanda Wein, OFFIS e.V. and Carl von Ossietzky Universität Oldenburg
12:10 - 12:25 10' + 5' Q&A Towards Flexible Assessment of Metadata Quality in Research Meta Portals Bianca Wentzel, Fraunhofer FOKUS – Institute for Open Communication Systems,
12:25 - 12:30 5' Wrap-up Leyla Jael Castro

Keynote

This year DaMaLOS will have Dr. Fotis Psomopoulos as invited keynote speaker. Dr Fotis Psomopoulos is a Senior Researcher at the Institute of Applied Biosciences, at the Centre for Research and Technology Hellas, in Thessaloniki Greece. His research interests lie at the intersection of Bioinformatics and Machine Learning, primarily working on the design and implementation of novel algorithms for knowledge extraction from large datasets in Life Sciences. As such, he is co-leading the ELIXIR Machine Learning Focus Group which produced the DOME recommendations, and is a member of the CLAIRE network. Moreover, he is a proponent of Open and FAIR Research Software, co-leading relevant activities under ELIXIR and RDA, as well as coordinating EOSC EVERSE, an EU project on research software quality. Finally, in addition to his research activities, he is active in training efforts; notably he is a member of the ELIXIR Training Platform Executive Committee, a member of the EOSC-A Task Force on Research careers, recognition and credit, and a co-author of the Open Science Training Handbook.

Title: FAIR for Machine Learning; Building on the Lessons from FAIR Software

Abstract: Ensuring that data are FAIR is nowadays a clear expectation across all science domains, as a result of many years of global efforts. Research software, has only just started to receive the same level of attention in recent years, with targeted actions towards the definition of the FAIR principles as applied to research software, as well as concerted efforts around reproducibility, quality, and sustainability. Given the rapid rise of ML as a key technology across all science domains, it is important to build on our collective experience, and already start addressing the challenges ahead of us, towards making ML FAIR.

Book of abstracts

What’s Cooking in the NFDI4Culture Kitchen? A KG-based Research Data Integration Workflow

Authors: Bruns, Oleksandra (0000-0002-8501-6700); Tietz, Tabea (0000-0002-1648-1684); Söhn, Linnaea (0000-0001-8341-1187); Steller, Jonatan Jalle (0000-0002-5101-5275); Ondraszek, Sarah Rebecca (0009-0003-7945-6704); Poshumus, Etienne (0000-0002-0006-7542); Schrade, Torsten (0000-0002-0953-2818); Sack, Harald (0000-0001-7069-9804)

Abstract: The National Research Data Infrastructure (NFDI) aims to provide a standardized and sustainable research data infrastructure across diverse domains, facilitating efficient research and scientific advancement. Despite encompassing a wide range of scientific disciplines, NFDI consortia share a foundation of common goals and concepts, emphasizing collaboration and data interoperability. Leveraging interconnected data offers new research opportunities, but requires availability in Linked Open Data (LOD) format. On example of NFDI4Culture, this paper addresses challenges of heterogeneous and isolated cultural heritage research data, and discusses efforts and results towards the creation of NFDI4Culture-KG, including the establishment of a research data index, implementing an ETL (Extract-Transform-Load) environment, and engineering lightweight semantic representations.

Authors: Khatamirad, Mohammad (0000−0002−6723−1650); Wentzel, Bianca (0000−0002−9218−5676); Geske, Michael (0000−0002−1812−738X); Rosowski, Frank; Schimmler, Sonja (0000−0002−8786−7250)

Abstract: NFDI4Cat is a consortium within the NFDI initiative that aims to establish a national research data infrastructure for catalysis-related science and to provide standards and guidelines to the community. In addition to the development of ontologies, pilot systems were set up to support everyday research early on and to demonstrate the data and metadata flow within the overall infrastructure. One is being developed at BasCat utilizing the semantic representation of metadata and its provision. It consists of a local data repository based on Dataverse, adapted to the needs of heterogeneous catalysis and a central meta portal, providing metadata harvested from various resources including the mentioned repository. Piveau is based on DCAT-AP and Dataverse provides basic citation metadata. We are currently working on a domain-specific extension so that catalysis-related metadata can be semantically represented. The goal is to promote transparency and reproducibility, and to foster the sharing and reuse of research data in catalysis-related sciences, thus increasing the general exchange of knowledge.

An Appraisal of Automated Tools for FAIRness Evaluation

Authors: Guerreiro Azevedo, Leonardo (0000-0002-2109-1285); Banaggia, Gabriel (0000-0001-5555-894X); Tesolin, Julio (0000-0002-0240-4506); Cerqueira, Renato (0000-0003-2829-7857)

Abstract: The FAIR Principles were introduced to address data challenges and improve the Findability, Accessibility, Interoperability, and Reusability of digital resources, following several Semantic Web standards. ‘FAIRness’ corresponds to a percentage grade indicating how close a digital object is to fully abiding by those principles. Several tools have been developed to assess the FAIRness of data digital objects in support of enacting the FAIR Principles. This work offers an appraisal of tools that evaluate the FAIRness of such objects, focusing on fully automated solutions. We conduct a literature review about existing tools, extract from it a set of requirements they aim to fulfill, and assess how each one fares considering this ensemble. Our results help researchers and data stewards with an overview of the tools, including an analysis of the fulfillment of the requirements and existing gaps.

Advancements and challenges in assessing FAIR Principles: a hybrid approach through manual and automated assessments

Authors: Saldanha Bach, Janete (0000-0001-9011-5837); Mutschke, Peter (0000-0003-3517-8071)

Abstract: This paper explores the adoption and assessment of FAIR (Findable, Accessible, Interoperable, Reus-able) principles within the PID registration service of KonsortSWD under the National Research Data Infrastructure Germany (NFDI). Employing the Research Data Alliance - FAIR Data Maturity Model (RDA-FDMM) and the F-UJI Data Assessment Tool presents a dual approach combining manual and automated assessments to measure FAIR compliance. The study underscores the challenges of FAIR principles’ broad interpretability and the diversity of research outputs, advocating for a hybrid evaluation encom-passing both machine-readable and non-machine-readable elements. Through the analysis, the paper presents insights into the efficacy of current tools and methodologies for FAIR assessment, reporting the limitations of automated tools and the critical role of comprehensive manual evaluation. The paper concludes with recommendations for improving automated scores in the tool used. It highlights the necessity of a “FAIR by design” approach from the beginning of a project or service development to ensure the embedding of FAIR principles in research outcomes.

FAIR Data Publishing with Apache Maven

Authors: Stadler, Claus (0000−0001−9948−6458); Bin, Simon; Bühmann, Lorenz

Abstract: Design and management of a large number of data processing pipelines is a challenging task. Analogous to DevOps, the term DataOps was coined to capture all the practices, processes and technologies related to the management of the life cycle of data artifacts, including the tracking of provenance. The solution space has been constantly increasing with novel approaches and tools becoming available, however with – for instance – more than 100 workflow engines available it is by far no longer feasible to assess them all. Semantic Web technology features many aspects relevant to DataOps, such as interlinkability of resources, DCAT for building decentral data catalogs, PROV-O for provenance descriptions, VoID for describing statistics about the used classes and properties. Yet, there are only few approaches that establish a coherent and holistic connection between these elements. In this work, we perform an in-depth analysis of the Apache Maven build system and its surrounding ecosystem for how they can be leveraged for automated data processing, publishing and RDF metadata generation with provenance tracking. We present three novel maven plugins for SPARQL and RML execution, the creation of an RDF database file, and uploading artifacts to a CKAN instance. Finally, we present a prototype architecture where a Maven deployment of a geographic RDF dataset results in the automated generation of DCAT, PROV-O and VoID metadata such that datasets can be browsed on a map and filtered e.g. by the used classes and properties. All our resources are freely available as Open Source.

Towards FAIR Data in Energy System Research: Challenges and Perspectives

Authors: Wein, Amanda (0009−0009−2960−3474); Bechara, Mazen (0009−0009−7554−3935); Rohde, Philipp D. (0000−0002−9835−4354); Vidal, Maria-Esther (0000−0003−1160−8727)

Abstract: FAIR data principles are important in the energy sector for enabling comparisons of data sets and ensuring that interested parties in this and other sectors can fully understand energy system data. However, these parties may have different priorities and requirements for the management of these data. This paper presents the results of a literature review of research data management requirements in the field of energy systems research, conducted to better understand which requirements are considered most important by energy researchers. The paper then outlines some tools that can help address data management requirements in the energy domain: the Open Energy Ontology, the NFDI4Energy platform, and the Leibniz Data Manager, a research data management repository that resorts to Semantic Web technologies to enable FAIR principles in the lifecycle of energy-related data.

Towards Flexible Assessment of Metadata Quality in Research Meta Portals

Authors: Wentzel, Bianca (0000−0002−9218−5676); Peters, Michael (0009−0005−4669−4508); Chen, Zongxiong (0000−0003−2452−0572); Schimmler, Sonja (0000−0002−8786−7250)

Abstract: The NFDI initiative aims to establish national research data infrastructures as well as standards and guidelines. In this context, we are working on the provision of research artifacts enriched with (1) artifact and (2) domain-specific metadata, and also on the evaluation of the quality and FAIRness of this metadata. For this purpose, extended metadata standards are currently being developed which will serve as a basis for describing research results and for assessing the quality and FAIRness in an artifact- and discipline-specific manner. We are utilizing Piveau, a data management ecosystem, to set up meta portals, giving unified access to research artifacts available in a distributed setting. Within Piveau, metadata is represented via DCAT-AP. We further deploy Piveau Metrics, a component of the Piveau ecosystem, for assessing the metadata quality and FAIRness of research artifacts. These assessments are based on the five star principles for Linked Open Data and on the FAIR principles. Based on the extended metadata standards developed, we are currently adapting Piveau and Piveau Metrics in order to tailor them to artefact and domain-specific settings.