Skip to content

Provenance documentation

Provenance documentation:#

Provenance documentation of (automated) SOP steps is required to enable reusability of data and validity checks. Provenance information needs to document the agent, entities and activities and should facilitate reproducibility but mainly document execution steps rather than enable the fully-automated re-execution which would further require the automated setup of the software environment (e.g. through Docker). Provenance of individual SOP steps should be recorded in a machine-readable fashion (e.g. in a JSON file). The w3-prov standard defines how provenance information can be stored, however this format is rather complex and was defined at a time when FAIRness was not around. At present, we know of no ideal provenance format but of course the prerequisites that the actors, activities and entities of the provenance workflow should be recorded remains true.

One possibility for FAIR marine images is to use the image-provenance field and/or to store an additional machine-readable provenance file next to the image data. Such a file could look like this:

{
    "provenance":
        "executables": [
            {
                "name": <executable name>
                "version": <version string of executable>
                "parameters": [
                    {
                        "name": <param-x_name>
                        "value": <param-x_value>
                        "hash": <md5 hash of file at <param-x_value> (optional, only for files)>
                    },
                    {
                        "name": <param-y_name>
                        "value": <param-y_value>
                    }]
                "log": all the logging information from the executable
                "hashes": null
                "time": <time of execution: in utc, human-readable, with milliseconds (%Y%m%d %H:%M:%S.%f)>
            },
            {
                ...
                "parameter":
                ...
                "log":
                ...
                "hashes": <md5 hashes of previous provenance file>
                "time": ...
            }
        }
    }
}

When working with the folder structure place this file at: /<volume>/<project>/<event>/<sensor>/protocol/<event>_<sensor>_provenance-<executable name>-<datetime>.json

In case an additional processing step applied to a entity, the additional provenance information shall be appended to the provenance file of the entities’ creation. Together with the SHA256 hash of the previous provenance file, a blockchain-like behaviour is enabled.