Skip to content

Commit b0c10db

Browse files
committed
Reword checksum sections to reflect uncertainty in technical approach
1 parent ac68a39 commit b0c10db

File tree

1 file changed

+19
-7
lines changed

1 file changed

+19
-7
lines changed

doc/architectural_decisions/007-output-file-archiving.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -226,18 +226,30 @@ This location was chosen because it mirrors the archiving setup used by neutron
226226

227227
Bluesky should mark files as read-only, using Windows file attributes, when it has finished writing them. This is so
228228
that the archiving process can unambiguously tell whether a file has finished being written. It also reduces the
229-
likelihood that a file is accidentally modified.
230-
231-
Bluesky should generate checksums for each file it has finished writing, and insert those checksums into a windows
232-
alternative file stream, comparable to what is done for existing DAE data. These checksums
233-
can be used to check for data corruption as the files are moved to the archive, and later replicated between the
234-
archive servers.
229+
likelihood that a file is accidentally modified.
230+
231+
Checksums should be generated, either at the point when the data is initially generated, or by the archiving process
232+
just before it first copies or moves a file.
233+
234+
We have agreed on the desire to generate checksums for data, which is already done for DAE data. These checksums are
235+
useful to check for data corruption, which might occur in transit, or in-place on instrument computers or archive servers.
236+
A number of checksumming approaches have been considered, and no approach has been chosen yet. The options discussed
237+
are:
238+
- **Use windows alternate file streams**. This is how checksums are done in existing DAE `.raw` files. It has the
239+
advantage that it is relatively simple to implement, but the disadvantage that they do not map nicely onto Linux file
240+
systems.
241+
- **Generate one checksum per file**, for example `file.txt` would also have an associated `file.sha1.txt` containing the
242+
checksum. The advantage is that this is simple to implement and platform-agnostic. The disadvantage is that it doubles
243+
the number of files visible in the archive area.
244+
- **Generate a single checksum file** containing the checksums of all bluesky data, at a higher level of granularity (for
245+
example by RB number or by cycle). It is currently unclear exactly how this approach would be implemented, and at what
246+
point these checksums would be moved to the archive.
235247

236248
### Moving to the ISIS archive
237249

238250
An automated cron task will look for read-only Bluesky output files, and their associated checksums, in `c:\data` at
239251
regular short intervals (for example, 1 minute), and will move them to:
240-
- The ISIS data archive, under the `autoreduced/bluesky_scans`. The `autoreduced` folder already exists on the archive.
252+
- The ISIS data archive, under `autoreduced/bluesky_scans`. The `autoreduced` folder already exists on the archive.
241253
- The data cache disk on the instrument, under `c:\data\Export only\RB<rb_number\bluesky_scans`.
242254

243255
Data on the cache disk, under `Export only`, is kept on the instrument for a short period (usually 24 hours), and then

0 commit comments

Comments
 (0)