ISISComputingGroup · rerpha · Jul 4, 2025 · Jun 26, 2025 · Jun 26, 2025 · Jul 3, 2025
diff --git a/doc/architectural_decisions/007-output-file-archiving.md b/doc/architectural_decisions/007-output-file-archiving.md
@@ -0,0 +1,283 @@
+# 7. Output file archiving
+
+## Status
+
+Draft
+
+## Context
+
+Our bluesky implementation contains bluesky callbacks which produce scientist-facing output files, for example:
+- [Human-readable scan result files](/callbacks/file_writing): {py:obj}`HumanReadableFileCallback <ibex_bluesky_core.callbacks.HumanReadableFileCallback>`
+- [Fitting results](/fitting/livefit_logger): {py:obj}`LiveFitLogger <ibex_bluesky_core.callbacks.LiveFitLogger>`
+- [Plot PNGs](#plot_png_saver): {py:obj}`PlotPNGSaver <ibex_bluesky_core.callbacks.PlotPNGSaver>`
+
+In addition, we have a [developer-facing callback for diagnostics](/callbacks/docs_logging_callback), 
+{py:obj}`DocLoggingCallback <ibex_bluesky_core.callbacks.DocLoggingCallback>`.
+
+The above callbacks produce files on disk in response to a bluesky scan. These files contain valuable data and so we
+need to consider how these files are archived for the long term. This must align with the 
+[ISIS Data Policy](https://www.isis.stfc.ac.uk/pages/data-policy.aspx). We should make an attempt to align with
+[FAIR principles](https://www.go-fair.org/fair-principles/).
+
+According to the definitions in the [ISIS Data Policy](https://www.isis.stfc.ac.uk/pages/data-policy.aspx), the data
+generated by bluesky is generally either "facility generated reduced data" or "metadata".
+
+This ADR is concerned with the location in which these bluesky output files are stored, and the archiving infrastructure
+which is therefore used to keep these files for the long term.
+
+---
+
+At the time of writing this ADR, in June 2025, the scientist-facing files are being written to
+```
+...\inst$\<instrument>\user\bluesky_scans\<rb_number>\
+```
+
+This location has some disadvantages:
+- It is a network location, which means that a site network break will cause bluesky scans to fail to run
+- It is not a location designed for long-term scientifically useful data - for example in terms of data integrity
+- It is not necessarily accessible from downstream systems such as Topcat
+
+Therefore, we would like to define a different, more suitable, location into which bluesky output files can be written.
+
+---
+
+Some representative use-cases are presented below, showing how data is expected to be used by scientists (click to
+expand each use case):
+
+<details>
+<summary>1 Bluesky scan, no neutron runs (e.g. scanning against a block)</summary>
+
+```{mermaid}
+sequenceDiagram
+actor PI
+participant NDX
+participant Archive
+participant TopCat
+note over PI:Start of RBNumber experiment
+PI ->> NDX: Start bluesky scan
+note over PI: Time Passes
+note over NDX: Bluesky scan ends
+note over NDX: creates scan.ascii and scan.nxs
+NDX ->> Archive: Sends scan.ascii and scan.nxs
+TopCat ->> Archive: Collects scan.ascii and scan.nxs
+note over PI: 5 months later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to scan.ascii and scan.nxs
+note over PI: 1 year later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to scan.nxs
+```
+</details>
+
+<details>
+<summary>1 Bluesky scan, aborted neutron runs</summary>
+
+```{mermaid}
+sequenceDiagram
+actor PI
+participant NDX
+participant Archive
+participant TopCat as Online Catalogue
+note over PI:Start of RBNumber experiment
+PI ->> NDX: Start bluesky scan
+note over NDX: DAE run started by scan <br/> Time passes <br/> Required data gathered in scan documents <br/> Abort DAE run
+note over NDX: DAE run started by scan <br/> Time passes <br/> Required data gathered in scan documents <br/> Abort DAE run
+note over NDX: DAE run started by scan <br/> Time passes <br/> Required data gathered in scan documents <br/> Abort DAE run
+note over NDX: Bluesky scan ends
+note over NDX: creates scan.ascii and scan.nxs
+NDX ->> Archive: Sends scan.ascii and scan.nxs
+TopCat ->> Archive: Collects scan.ascii and scan.nxs
+note over PI: 5 months later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to scan.ascii and scan.nxs
+note over PI: 1 year later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to scan.nxs
+```
+</details>
+
+<details>
+<summary>1 Bluesky scan, one neutron run</summary>
+
+```{mermaid}
+sequenceDiagram
+actor PI
+participant NDX
+participant Archive
+participant TopCat
+note over PI:Start of RBNumber experiment
+PI ->> NDX: Start bluesky scan
+note over NDX: Bluesky scan starts DAE run
+note over PI: Time Passes
+note over NDX: Bluesky scan ends DAE run <br/> Bluesky scan ends
+par
+note over NDX: creates runnumber.nxs with DAE and SE data
+and
+note over NDX: creates scan.ascii and scan.nxs
+end
+NDX ->> Archive: Sends runnumber.nxs, scan.ascii, and scan.nxs
+TopCat ->> Archive: Collects runnumber.nxs, scan.ascii, and scan.nxs
+note over PI: 5 months later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to runnumber.nxs, scan.ascii, and scan.nxs
+note over PI: 1 year later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to runnumber.nxs and scan.nxs
+```
+</details>
+
+<details>
+<summary>1 Bluesky scan, N neutron runs</summary>
+
+```{mermaid}
+sequenceDiagram
+actor PI
+participant NDX
+participant Archive
+participant TopCat
+note over PI:Start of RBNumber experiment
+PI ->> NDX: Start bluesky scan
+note over NDX: Bluesky scan starts DAE run
+note over PI: Time Passes
+note over NDX: Bluesky scan ends DAE run
+note over NDX: creates runnumber.nxs with DAE and SE data
+NDX ->> Archive: Sends runnumber.nxs
+TopCat ->> Archive: Collects runnumber.nxs
+note over PI: Time Passes
+note over NDX: Bluesky scan starts DAE run
+note over PI: Time Passes
+note over NDX: Bluesky scan ends DAE run
+note over NDX: creates runnumber+1.nxs with DAE and SE data
+NDX ->> Archive: Sends runnumber+1.nxs
+TopCat ->> Archive: Collects runnumber+1.nxs
+note over NDX: Bluesky scan ends
+NDX ->> Archive: Sends scan.ascii and scan.nxs
+TopCat ->> Archive: Collects scan.ascii and scan.nxs
+note over PI: 5 months later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to runnumber.nxs, runnumber+1.nxs, scan.ascii, and scan.nxs
+note over PI: 1 year later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to runnumber.nxs, runnumber+1.nxs, and scan.nxs
+```
+</details>
+
+<details>
+<summary>1 Bluesky scan, neutron/muon runs on multiple instruments</summary>
+
+```{mermaid}
+sequenceDiagram
+actor PI
+participant NDX-A
+participant NDX-B
+participant NDX-C
+participant Archive
+participant TopCat
+note over PI:Start of RBNumber experiment
+PI ->> NDX-A: Start bluesky scan
+NDX-A ->> NDX-B: Start DAE run
+NDX-A ->> NDX-C: Start DAE run
+note over PI: Time Passes
+NDX-B ->> NDX-A: Provides summary run data
+NDX-C ->> NDX-A: Provides summary run data
+NDX-A ->> NDX-B: End DAE run
+note over NDX-B: creates runnumberB.nxs with DAE and SE data
+NDX-B ->> Archive: Sends runnumberB.nxs
+TopCat ->> Archive: Collects runnumberB.nxs
+NDX-A ->> NDX-C: End DAE run
+note over NDX-C: creates runnumberC.nxs with DAE and SE data
+NDX-C ->> Archive: Sends runnumberC.nxs
+TopCat ->> Archive: Collects runnumberC.nxs
+note over NDX-A: Bluesky scan ends
+NDX-A ->> Archive: Sends scan.ascii and scan.nxs
+TopCat ->> Archive: Collects scan.ascii and scan.nxs
+note over PI: 5 months later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to runnumberB.nxs, runnumberC.nxs, scan.ascii, and scan.nxs
+note over PI: 1 year later
+PI ->> TopCat: Show me my data
+TopCat ->> PI: Provides access to runnumberB.nxs, runnumberC.nxs, and scan.nxs
+```
+</details>
+
+## Present
+
+The following people have been involved in discussions leading up to this ADR:
+
+- Tom
+- Chris M-S
+- George
+- Kathryn
+- Jack H
+- CK (Reflectometry)
+
+This document was additionally reviewed in a regular Thursday code-review slot by the whole IBEX team.
+
+## Decisions
+
+### File-writing location
+
+Bluesky should write data into the `c:\data\RB<rb_number>\bluesky_scans\` folder during a scan.
+File naming itself will keep its current scheme (timestamped files).
+
+This location was chosen because it mirrors the archiving setup used by neutron cameras on IMAT.
+
+### Attributes & checksums
+
+Bluesky should mark files as read-only, using Windows file attributes, when it has finished writing them. This is so
+that the archiving process can unambiguously tell whether a file has finished being written. It also reduces the
+likelihood that a file is accidentally modified.
+
+Checksums should be generated, either at the point when the data is initially generated, or by the archiving process
+just before it first copies or moves a file.
+
+We have agreed on the desire to generate checksums for data, which is already done for DAE data. These checksums are
+useful to check for data corruption, which might occur in transit, or in-place on instrument computers or archive servers.
+A number of checksumming approaches have been considered, and no approach has been chosen yet. The options discussed
+are:
+- **Use windows alternate file streams**. This is how checksums are done in existing DAE `.raw` files. It has the
+advantage that it is relatively simple to implement, but the disadvantage that they do not map nicely onto Linux file
+systems.
+- **Generate one checksum per file**, for example `file.txt` would also have an associated `file.sha1.txt` containing the
+checksum. The advantage is that this is simple to implement and platform-agnostic. The disadvantage is that it doubles
+the number of files visible in the archive area.
+- **Generate a single checksum file** containing the checksums of all bluesky data, at a higher level of granularity (for
+example by RB number or by cycle). It is currently unclear exactly how this approach would be implemented, and at what
+point these checksums would be moved to the archive.
+
+### Moving to the ISIS archive
+
+An automated cron task will look for read-only Bluesky output files, and their associated checksums, in `c:\data` at
+regular short intervals (for example, 1 minute), and will move them to:
+- The ISIS data archive, under `autoreduced/bluesky_scans`. The `autoreduced` folder already exists on the archive. 
+- The data cache disk on the instrument, under `c:\data\Export only\RB<rb_number\bluesky_scans`.
+
+Data on the cache disk, under `Export only`, is kept on the instrument for a short period (usually 24 hours), and then
+deleted by existing processes.
+
+This is run as a cron task so that, if the network happens to be unavailable at the time when a scan ends, the copy
+process will catch up when the network becomes available again. This cron task will only move files which sit within
+a `bluesky_scans` folder, to prevent it from interfering with other non-bluesky files.
+
+Creating a new `bluesky_scans` folder alongside the existing `autoreduced` folder was considered, but was felt to be
+unachievable - it would require too much work relative to using the existing `autoreduced` folder.
+
+### File formats
+
+At present, our scan file output format is explicitly designed to be "human-readable" (and, in fact, the callback which
+generates these files is explicitly called
+{py:obj}`HumanReadableFileCallback <ibex_bluesky_core.callbacks.HumanReadableFileCallback>`).
+
+We have [issue 26](https://github.com/ISISComputingGroup/ibex_bluesky_core/issues/26) which will implement
+machine-readable files, using a format such as `.hdf5` or `.nxs`. These files will sit alongside the existing
+human-readable files; it is acknowledged that while machine-readable files are better from a data preservation and
+archiving standpoint, we will need to retain the human-readable files to support quick browsing by scientists without
+using special software.
+
+## Consequences
+
+- Bluesky output data will be stored in a location suitable for long-term, scientifically useful, data. This includes
+data integrity and availability concerns.
+- Bluesky scans will no longer be reliant on a network location being available to run a scan
+- The initial location where bluesky writes data (`c:\data\<rb number>`) will not be the same as its final location (the
+`autoreduced` folder on the ISIS archive). This is also true for current DAE data, as generated by the ISISICP.
diff --git a/doc/callbacks/plotting.md b/doc/callbacks/plotting.md
@@ -83,6 +83,7 @@ Due to an implementation detail of {py:obj}`matplotlib.pyplot.pcolormesh`,
 the plot will only appear once at least *two* rows of data have been collected.
 :::
 
+{#plot_png_saver}
 ## Saving plots to PNG files
 
 `ibex_bluesky_core` provides a {py:obj}`PlotPNGSaver<ibex_bluesky_core.callbacks.PlotPNGSaver>` callback to save plots on a run stop to PNG files, which by saves them to the default output file location unless a filepath is explicitly given.

diff --git a/doc/conf.py b/doc/conf.py
@@ -29,7 +29,7 @@
     ("py:obj", r"^.*\.T.*_co$"),
 ]
 
-myst_enable_extensions = ["dollarmath", "strikethrough", "colon_fence"]
+myst_enable_extensions = ["dollarmath", "strikethrough", "colon_fence", "attrs_block"]
 suppress_warnings = ["myst.strikethrough"]
 
 extensions = [
@@ -43,7 +43,10 @@
     "sphinx.ext.intersphinx",
     # Add links to source code in API docs
     "sphinx.ext.viewcode",
+    # Mermaid diagrams
+    "sphinxcontrib.mermaid",
 ]
+mermaid_d3_zoom = True
 napoleon_google_docstring = True
 napoleon_numpy_docstring = False
 
@@ -70,7 +73,7 @@
 html_favicon = "favicon.svg"
 
 autoclass_content = "both"
-myst_heading_anchors = 3
+myst_heading_anchors = 7
 autodoc_preserve_defaults = True
 
 intersphinx_mapping = {

diff --git a/pyproject.toml b/pyproject.toml
@@ -59,6 +59,7 @@ doc = [
   "sphinx_rtd_theme", 
   "myst_parser",
   "sphinx-autobuild",
+  "sphinxcontrib-mermaid",
 ]
 dev = [
   "ibex_bluesky_core[doc]",