Metadata ingest files

Metadata ingest files are the input format for class icat.ingest.IngestReader. This class is intended to be used in scripts that read the metadata created by experiments into ICAT. The file format is basically a restricted version of ICAT data XML files.

The underlying idea is that ICAT data files are in principle suitable to encode the metadata to be ingested from the experiment. The only problem is that this file format is too powerful: it can encode any ICAT content. We want the ingest files from the experiment to create new Datasets and DatasetParameters, we certainly don’t want these files to create new Instruments or Users in ICAT. And we also want to control to which Investigation newly created Datasets are going to be added. It would be rather difficult to control the power of the input format if we would use plain ICAT data files for this purpose.

Note

The metadata ingest file format is versioned. This version number is independent from the python-icat version. It is incremented only when the format changes. The latest version of the metadata ingest file format is 1.1.

Changed in version 1.2.0: add metadata ingest file format version 1.1, adding support for relating Datasets with Samples.

Differences compared to ICAT data XML files

Class icat.ingest.IngestReader takes an investigation argument. We will refer to the Investigation given in this argument as the prescribed Investigation in the following. The metadata ingest file format restricts ICAT data XML files in the following ways:

  • ingest files must contain one and only one data element, e.g. one chunk according to the Logical structure of ICAT data files.

  • the allowed object types are restricted to Dataset, DatasetInstrument, DatasetTechnique, and DatasetParameter.

  • the attributes in the object definitions for Datasets are restricted to name, description, startDate, and endDate.

  • object definitions for Datasets can not include references to the related Investigation or DatasetType. These relation will be added by icat.ingest.IngestReader. The relation to the Investigation will be set to the prescribed Investigation.

  • object definitions for Datasets can reference a related Sample only by name or by pid. A relation of the related Sample with the prescribed Investigation will be implied.

  • references to the related Dataset in DatasetInstrument, DatasetTechnique, and DatasetParameter definitions are restricted to local keys. As a result, these objects can only relate to Datasets defined in the same ingest file.

  • other object references are restricted to reference by attributes.

These restrictions are enforced by validating the input against an XML Schema Definition (XSD).

Another change with respect to ICAT data XML files is that the name of the root element is icatingest and that it must have a version attribute.

Example

Consider the following example:

<?xml version='1.0' encoding='UTF-8'?>
<icatingest version="1.1">
  <head>
    <date>2024-02-02T12:52:00+01:00</date>
    <generator>metadata-writer 0.28</generator>
  </head>
  <data>
    <dataset id="Dataset_1">
      <name>e202553</name>
      <description>Dy01Cp02 at 2.7 K</description>
      <startDate>2020-09-30T18:02:17+02:00</startDate>
      <endDate>2020-09-30T20:18:36+02:00</endDate>
      <sample name="ab3465"/>
      <datasetInstruments>
        <instrument pid="DOI:00.0815/inst-00001"/>
      </datasetInstruments>
      <datasetTechniques>
        <technique pid="PaNET:PaNET01217"/>
      </datasetTechniques>
    </dataset>
    <dataset id="Dataset_2">
      <name>e202554</name>
      <description>Dy01Cp02 at 5.1 K</description>
      <startDate>2020-09-30T20:29:19+02:00</startDate>
      <endDate>2020-09-30T21:23:49+02:00</endDate>
      <sample name="ab3465"/>
      <datasetInstruments>
        <instrument pid="DOI:00.0815/inst-00001"/>
      </datasetInstruments>
      <datasetTechniques>
        <technique pid="PaNET:PaNET01217"/>
      </datasetTechniques>
    </dataset>
    <dataset id="Dataset_3">
      <name>e202555</name>
      <description>Dy01Cp02 at 2.7 K</description>
      <startDate>2020-09-30T21:35:16+02:00</startDate>
      <endDate>2020-09-30T23:04:27+02:00</endDate>
      <sample name="ab3466"/>
      <datasetInstruments>
        <instrument pid="DOI:00.0815/inst-00001"/>
      </datasetInstruments>
      <datasetTechniques>
        <technique pid="PaNET:PaNET01217"/>
      </datasetTechniques>
    </dataset>
    <dataset id="Dataset_4">
      <name>e202556</name>
      <description>reference</description>
      <startDate>2020-09-30T23:04:31+02:00</startDate>
      <endDate>2020-10-01T01:26:07+02:00</endDate>
      <datasetInstruments>
        <instrument pid="DOI:00.0815/inst-00001"/>
      </datasetInstruments>
      <datasetTechniques>
        <technique pid="PaNET:PaNET01217"/>
      </datasetTechniques>
    </dataset>
    <datasetParameter>
      <stringValue>neutron</stringValue>
      <dataset ref="Dataset_1"/>
      <type name="Probe"/>
    </datasetParameter>
    <datasetParameter>
      <numericValue>5.3</numericValue>
      <dataset ref="Dataset_1"/>
      <type name="Reactor power" units="MW"/>
    </datasetParameter>
    <datasetParameter>
      <numericValue>2.74103</numericValue>
      <rangeBottom>2.7408</rangeBottom>
      <rangeTop>2.7414</rangeTop>
      <dataset ref="Dataset_1"/>
      <type name="Sample temperature" units="K"/>
    </datasetParameter>
    <datasetParameter>
      <stringValue>neutron</stringValue>
      <dataset ref="Dataset_2"/>
      <type name="Probe"/>
    </datasetParameter>
    <datasetParameter>
      <numericValue>5.3</numericValue>
      <dataset ref="Dataset_2"/>
      <type name="Reactor power" units="MW"/>
    </datasetParameter>
    <datasetParameter>
      <numericValue>5.1239</numericValue>
      <rangeBottom>5.1045</rangeBottom>
      <rangeTop>5.1823</rangeTop>
      <dataset ref="Dataset_2"/>
      <type name="Sample temperature" units="K"/>
    </datasetParameter>
  </data>
</icatingest>

This file defines four Datasets with related objects. All datasets have a name, description, startDate, and endDate attribute and include a relation with an Instrument and a Technique, respectively.

Note that the Datasets have no complete attribute and no relation with Investigation or DatasetType respectively. All of these are added with prescribed values by class icat.ingest.IngestReader.

Some Datasets relate to Samples: the first two Datasets relate to the same Sample, the third Dataset to another Sample, while the last Dataset has no relation with any Sample. All Samples are referenced by their name. Class icat.ingest.IngestReader will add a reference to the Investigation to this, so that only Samples that are related to the prescribed Investigation can actually be referenced.

Some DatasetParameter are added as separate objects in the file. They respectively reference their related Datasets using local keys that are defined in the id attribute of the corresponding Dataset earlier in the file. Alternatively, the DatasetParameter could have been included into into the respective Datasets.