@@ -226,18 +226,30 @@ This location was chosen because it mirrors the archiving setup used by neutron
226226
227227Bluesky should mark files as read-only, using Windows file attributes, when it has finished writing them. This is so
228228that the archiving process can unambiguously tell whether a file has finished being written. It also reduces the
229- likelihood that a file is accidentally modified.
230-
231- Bluesky should generate checksums for each file it has finished writing, and insert those checksums into a windows
232- alternative file stream, comparable to what is done for existing DAE data. These checksums
233- can be used to check for data corruption as the files are moved to the archive, and later replicated between the
234- archive servers.
229+ likelihood that a file is accidentally modified.
230+
231+ Checksums should be generated, either at the point when the data is initially generated, or by the archiving process
232+ just before it first copies or moves a file.
233+
234+ We have agreed on the desire to generate checksums for data, which is already done for DAE data. These checksums are
235+ useful to check for data corruption, which might occur in transit, or in-place on instrument computers or archive servers.
236+ A number of checksumming approaches have been considered, and no approach has been chosen yet. The options discussed
237+ are:
238+ - ** Use windows alternate file streams** . This is how checksums are done in existing DAE ` .raw ` files. It has the
239+ advantage that it is relatively simple to implement, but the disadvantage that they do not map nicely onto Linux file
240+ systems.
241+ - ** Generate one checksum per file** , for example ` file.txt ` would also have an associated ` file.sha1.txt ` containing the
242+ checksum. The advantage is that this is simple to implement and platform-agnostic. The disadvantage is that it doubles
243+ the number of files visible in the archive area.
244+ - ** Generate a single checksum file** containing the checksums of all bluesky data, at a higher level of granularity (for
245+ example by RB number or by cycle). It is currently unclear exactly how this approach would be implemented, and at what
246+ point these checksums would be moved to the archive.
235247
236248### Moving to the ISIS archive
237249
238250An automated cron task will look for read-only Bluesky output files, and their associated checksums, in ` c:\data ` at
239251regular short intervals (for example, 1 minute), and will move them to:
240- - The ISIS data archive, under the ` autoreduced/bluesky_scans ` . The ` autoreduced ` folder already exists on the archive.
252+ - The ISIS data archive, under ` autoreduced/bluesky_scans ` . The ` autoreduced ` folder already exists on the archive.
241253- The data cache disk on the instrument, under ` c:\data\Export only\RB<rb_number\bluesky_scans ` .
242254
243255Data on the cache disk, under ` Export only ` , is kept on the instrument for a short period (usually 24 hours), and then
0 commit comments