Skip to content

Provenance documentation

Provenance documentation:#

Provenance documentation of (automated) SOP steps is required to enable reusability of data and validity checks. Provenance information needs to document the agent, entities and activities and should facilitate reproducibility but mainly document execution steps rather than enable the fully-automated re-execution which would further require the automated setup of the software environment (e.g. through Docker). Provenance of individual SOP steps should be recorded in a machine-readable fashion (i.e. a yaml file) like so:

    - executable:
        name: <executable name>
        version: <version string of executable>
        - name: <param-x_name>
          value: <param-x_value>
          [hash: <md5 hash of file at <param-x_value> (optional, only for files)>]
        - name: <param-y_name>
          value: <param-y_value>
      log: all the logging information from the executable
      hashes: null
      time: <time of execution: in utc, human-readable, with milliseconds (%Y%m%d %H:%M:%S.%f)>
    - executable:
      hashes: <md5 hashes of previous provenance file>
      time: ...

When working with the AGV/I folder structure place this file at: /<volume>/<project>/<event>/<sensor>/protocol/<event>_<sensor>_provenance-<executable name>-<datetime>.yaml

In case an additional processing step applied to a entity, the additional provenance information shall be appended to the provenance file of the entities’ creation. Together with the SHA256 hash of the previous provenance file, a blockchain-like behaviour is enabled.