Describing learning materials

The importance of metadata

Metadata plays a central role in the implementation of the FAIR principles and enables easier sharing of data and material. By associating relevant metadata information to each learning resource, characteristics relevant to all four FAIR aspects can be identified at a glance. Descriptive metadata that is understandable by humans and readable by machines aids the findability of the content through search engines or specialized catalogues. Including information regarding the access rules and associated licence improves the accessibility of the resources. Adding a summary description to each item and modelling its relation to other items through the use of metadata illustrates the interoperability of the given learning material, thus boosting its findability as well as reusability value.

Metadata schemas

For metadata to be effective and to fulfil the described objectives, its structure must be consistent and unambiguous, as well as adhere to a widely used schema. According to ISO, metadata schema ¹ is "a logical plan showing the relationships between metadata elements, normally through establishing rules for the use and management of metadata specifically as regards the semantics, the syntax and the optionality (obligation level) of values". At present a number of metadata schemas focused on learning resources exist, with varying verbosity, tackling different aspects and subject areas. Schema.org ² is one of the most popular examples and its versatility allows the same vocabulary to be used for different types of resources. When describing learning resources, the relevant type is "CreativeWork" ³. Being a community led effort, it is also possible to extend existing types or derive new ones. Both "Course" ⁴ and "LearningResource" ⁵ (the latter of which is still not fully integrated) share the same "CreativeWork" parent.

Bioschemas ⁶ along with the Open Educational Resources Schema (OER) ⁷ are two additional examples which are based on the work of Schema.org. Both extend the existing vocabulary with additional terms and declare new types. Bioschemas is primarily focused on describing datasets, software and training materials related to life sciences. OER is aimed at traditional learning materials, introducing granular types, such as: "Assessment", "Course", "Quiz", "Project", etc. The available context specific terms such as "termOffered", "department", "program", underscore the main applicability of this schema as one for formal education institutions. The European Life Sciences Infrastructure for Biological Information (ELIXIR) ⁸ has also defined a metadata set to aid the trainees of their training platform to better identify the resources relevant to them. It is less verbose than the ones already described, and includes 13 core fields of information, describing general information, prerequisites, and outcomes for each resource.

Whenever discussing existing schemas or defining a completely new one, the existence of a significant trade-off needs to be recognized. On one hand the addition of new fields aids the overall descriptiveness and might increase the findability and reusability value of the described resources. However, on the other hand, mandating the presence of a large number of distinct fields hinders the adoption of the given schema, making it more difficult to ensure conformity of existing or new material. To solve this problem, the Education and Training on Handling Research Data Interest Group Research Data Alliance (ETHRD-IG RDA) task force developed the minimal RDA metadata set ⁹. This metadata set has been derived through the analysis of six existing metadata schemas, some of which were described above, with the end goal of creating an easily adoptable set of metadata elements. It is expected that resource creators would benefit from such a metadata set, allowing them to describe their learning resources when making them publicly available. The RDA metadata schema consists of 14 different fields, divided into 3 different categories of information: descriptive, access, and educational. It is the recommended metadata set to be used both for existing and new learning materials. An even more restricted profile aimed at establishing faster conformity for existing materials consisting of a subset of only 3 mandatory fields has also been proposed ¹⁰. These fields are:

Title – a human readable name of the learning resource
Author(s) – the name of the entities authoring the learning resource
URL to resource – a URL resolving either to the learning resource itself or to a dedicated page which includes additional contextual information including a direct link to the underlying resource.

A number of training platforms are currently actively evaluating the RDA minimal metadata set, such as OpenPLATO ¹¹, the training catalogue of the SSHOC project ¹² with training videos ¹³ and its Open Marketplace ¹⁴, and the NI4OS Training Platform ¹⁵. Accordingly, the Skills4EOSC Learning Platform has also adopted the RDA minimal metadata set enabling metadata description for its hosted learning objects. As the current EOSC efforts on defining a common metadata schema for learning resources are adopting the proposed minimal schema, the use of the RDA minimum metadata schema for learning resources ⁹ is recommended taking into account that close attention should be given to its future development.

Proposed extensions to the recommended RDA metadata schema

Agreeing on an existing, well-defined, and descriptive metadata set is essential for reusability of materials, and their findability through general purpose search engines and specialized catalogues such as the currently being developed by EOSC Future ¹⁶ which is envisioned to become the overarching training catalogue for the EOSC community. To aid the existing effort, we recommend the extension of the RDA metadata set with two additional fields, "isPartOf" and "isBasedOn". Both of these fields are already part of Schema.org and its derivatives. The allowed value for the fields in this case would be a URL to the respective resource. The inclusion of these two fields would increase the number of minimal metadata elements to 16, but with the added benefit of being able to better model hierarchical relationships between learning materials, and the findability of related content. In essence, this will allow the metadata information to better reflect the relationships between learning objects and allow traceability of information back to its source.

Additionally, the set of values possible for the licence field could be further restricted, mandating that it only be a URL to the text of the associated licence. This would make it easier for machines to understand the field and avoid ambiguities which might arise as a result of inconsistent spelling or omitting a version of a given licence.

Controlled vocabularies as a framework for metadata values

To ensure the descriptive value of the additional information associated with each learning resource, and to make it consistent across different applications, metadata schemas restrict the values that a given field may contain. This can either be in the form of mandating its type – e.g., differentiating between a text or a number field; its cardinality, describing how many times it can be repeated; or its content altogether, specifying a set of pre-approved values from which the author or administrator can choose (controlled vocabulary). It is important to recognize that even though this behaviour might seem restrictive at first, it is necessary to ensure interoperability and in-ambiguity between platforms using the same metadata schema, while also providing uniform experience to the users. Content creators are strongly encouraged to adhere to the outlined guidelines. The document describing the recommended RDA metadata schema also includes information related to such restrictions ¹⁷. At present the following fields of the RDA minimal metadata schema have controlled vocabularies: Primary Language, Version Date, Resource URL Type, Target Group, Learning Resource Type, Access Cost, Expertise Level.

Metadata representation formats

Metadata can be even more relevant for machines than it is for humans. By ensuring that metadata for learning materials is provided in a machine-readable format, it can be ensured that it will be interpreted in the desired context by automated tools such as search engines, crawlers, link generators, and bots. To achieve this, learning infrastructures should be capable of serving the metadata information in a variety of formats, such as: unstructured, Comma-Separated Values (CSV), JavaScript Object Notation (JSON), Extensible Markup Language (XML), YAML Ain't Markup Language (YAML). The unstructured representation is most relevant for humans and can be provided in a visually appealing way, disregarding readability by machines. The CSV format can be beneficial for doing bulk information dumps, due to its simplicity, easy understandability by humans, and interoperability with existing software. However, the formats most popular today for data interchange between machines are JSON, XML and YAML. Most training catalogues and learning resource aggregators today make use of at least one of these three formats, in order to keep the metadata information in sync across the various training portals.

A number of metadata schemas have also introduced application profiles using the main standards for linked data including JavaScript Object Notation for Linked Data (JSON-LD) and Resource Description Framework (RDF). Such application profiles allow machines not only to read the data, but also interpret it and understand the context in which it is provided. The first version of the application profile for the RDA schema ¹⁸ has just recently been released ¹⁹.

Metadata longevity

It is expected that learning resources will tend to have a hierarchical structure where, for example, multiple learning objects are joined together in a module, which is part of a larger aggregation, such as a course. Furthermore, as discussed so far, the introduction of comprehensive metadata will improve the findability and reusability of learning resources, thus leading to scenarios where a given learning resource is referenced by multiple, otherwise independent, resources in the hierarchy. Such data cross-referencing is expected to be done using URLs which point to information hosted at various locations across the internet (for example using the proposed "isPartOf" and "isBasedOn" metadata fields). It is natural to expect that these URLs will decay over time and some of them might become unavailable, either due to the resource being moved, expired, deleted, or corrupted.

The concept of metadata longevity ²⁰ is based around the idea that the existence of the metadata needs to be ensured even in the absence of the original data to which it was originally assigned. By decoupling the metadata from the resource itself, it is possible to provide descriptors of what the original data was, and to assist in its interpretation, even when the original is not present. FAIR providers are encouraged to define a metadata longevity plan, fulfilling those objectives.

El Grito de Sunset Park Use Case. Metadata: Schema development and Documentation. URL: https://elgrito.witness.org/portfolio/metadata-schema/ (visited on 2023-03-07). ↩
Schema.org. Getting Started with schema.org. URL: https://schema.org/docs/gs.html (visited on 2023-03-07). ↩
Schema.org. CreativeWork - A Schema.org Type. URL: https://schema.org/CreativeWork. ↩
Schema.org. Course - A Schema.org Type. URL: https://schema.org/Course. ↩
Schema.org. LearningResource - A Schema.org Type. URL: https://schema.org/LearningResource. ↩
Bioschemas. Bioschemas. URL: https://bioschemas.org/ (visited on 2023-03-07). ↩
Open Educational Resource Schema. An RDF Vocabulary for Open Educational Resources. URL: http://oerschema.org/ (visited on 2023-03-07). ↩
Leyla Garcia, Bérénice Batut, Melissa L. Burke, Mateusz Kuzak, Fotis Psomopoulos, Ricardo Arcila, Teresa K. Attwood, Niall Beard, Denise Carvalho-Silva, Alexandros C. Dimopoulos, Victoria Dominguez del Angel, Michel Dumontier, Kim T. Gurwitz, Roland Krause, Peter McQuilton, Loredana Le Pera, Sarah L. Morgan, Päivi Rauste, Allegra Via, Pascal Kahlem, Gabriella Rustici, Celia W. G. van Gelder, and Patricia M. Palagi. Ten simple rules for making training materials FAIR. May 2020. URL: https://dx.plos.org/10.1371/journal.pcbi.1007854 (visited on 2023-03-07), doi:10.1371/journal.pcbi.1007854. ↩
Hoebelheinrich, Nancy J, Biernacka, Katarzyna, Brazas, Michelle, Castro, Leyla Jael, Fiore, Nicola, Hellström, Margareta, Lazzeri, Emma, Leenarts, Ellen, Martinez Lavanchy, Paula Maria, Newbold, Elizabeth, Nurnberger, Amy, Plomp, Esther, Vaira, Lucia, van Gelder, Celia W G, and Whyte, Angus. Recommendations for a minimal metadata set to aid harmonised discovery of learning resources (1.0). 2022. URL: https://doi.org/10.15497/RDA00073. ↩↩
EOSC Future. Training Catalogue - Minimal Metadata for Learning Resources - EOSC Future Public - Wiki. URL: https://wiki.eoscfuture.eu/display/PUBLIC/Training+Catalogue+-+Minimal+Metadata+for+Learning+Resources (visited on 2023-03-07). ↩
OpenAIRE. OpenPlato - List course in catalog. URL: https://openplato.eu/blocks/catalog/list.php (visited on 2023-03-07). ↩
SSHOC. Search entities \textbar SSH Training Discovery Toolkit. URL: https://training-toolkit.sshopencloud.eu/entities?search=&f%5B0%5D=content_type%3Asource (visited on 2023-03-07). ↩
SSHOC. SSHOC Training videos. URL: https://www.sshopencloud.eu/video-training. ↩
SSHOC. SSH Open Marketplace. URL: https://www.sshopencloud.eu/ssh-open-marketplace. ↩
NI4OS-Europe. NI4OS-Europe Training Platform. URL: https://training.ni4os.eu/ (visited on 2023-03-07). ↩
EOSC Portal. EOSC Search Service - Training. URL: https://search.eosc-portal.eu/search/training?q=* (visited on 2023-03-07). ↩
RDA ETHRD-IG. Minimal Metadata Set. URL: https://docs.google.com/document/d/1wEaGyuqnR4frusN692b2ngxsEmaoW83y38n88JZEEeQ/edit?usp=embed_facebook&usp=embed_facebook (visited on 2023-03-07). ↩
Elizabeth Newbold, Gabin Kayumbi, Angus Whyte, and Emma Lazzeri. Summary Report: Workshop on Harmonising Training Resource Metadata for EOSC Communities. May 2021. Publisher: Zenodo. URL: https://zenodo.org/record/4769468 (visited on 2023-03-07), doi:10.5281/ZENODO.4769468. ↩
A Primer on RDA Application Profiles by Melissa Parent \textbar RDA Toolkit. URL: https://www.rdatoolkit.org/node/256 (visited on 2023-06-26). ↩
FAIRsharing Team. FAIRsharing record for: FAIR Metrics - Metadata Longevity. 2018. URL: https://fairsharing.org/FAIRsharing.A2W4nz (visited on 2023-03-07), doi:10.25504/FAIRSHARING.A2W4NZ. ↩