Skip to content

Conversation

@Guiiix
Copy link
Member

@Guiiix Guiiix commented Jun 25, 2025

This PR is related to issue QubesOS/qubes-issues#8767

A new option disable snapshots can be set on volumes (lvm, file, reflink and zfs supported). When enabled on a private volume, the volume is used directly instead of doing a snapshot and use it.

The option can be enabled by setting revisions_to_keep = -1

Using the volume directly saves space but requires features like backup, dispvm or cloning to be disabled when the volume is in use.

Guiiix added 2 commits June 25, 2025 22:41
This method is called by API endpoints expecting a valid number. This
commit adds a new parameter 'allow_negative' to be able to work
with negative numbers.

# Conflicts:
#	qubes/api/__init__.py
This property is only applicable to private volumes (snap_on_start = False,
save_on_stop = True). This can be enabled by setting
revisions_to_keep = -1.

# Conflicts:
#	qubes/storage/__init__.py
@codecov
Copy link

codecov bot commented Jun 25, 2025

Codecov Report

Attention: Patch coverage is 78.18182% with 24 lines in your changes missing coverage. Please review.

Project coverage is 70.62%. Comparing base (be94b39) to head (f0b01f6).
Report is 37 commits behind head on main.

Files with missing lines Patch % Lines
qubes/vm/mix/dvmtemplate.py 27.27% 8 Missing ⚠️
qubes/storage/__init__.py 79.41% 7 Missing ⚠️
qubes/backup.py 28.57% 5 Missing ⚠️
qubes/storage/zfs.py 88.23% 2 Missing ⚠️
qubes/storage/lvm.py 87.50% 1 Missing ⚠️
qubes/storage/reflink.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #690      +/-   ##
==========================================
+ Coverage   70.51%   70.62%   +0.10%     
==========================================
  Files          61       61              
  Lines       13312    13395      +83     
==========================================
+ Hits         9387     9460      +73     
- Misses       3925     3935      +10     
Flag Coverage Δ
unittests 70.62% <78.18%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Guiiix Guiiix marked this pull request as ready for review June 25, 2025 22:23
@Guiiix Guiiix changed the title Disable snapshots New option disable snapshots to use a private volume directly Jun 25, 2025
@rustybird
Copy link
Contributor

rustybird commented Jun 26, 2025

Thank you for tackling this! It should make a big difference with database workloads.

For file-reflink IMO it would be neater if the snapshot_disabled volume only had a _path_dirty file without a _path_clean file, instead of the other way around:

That would prevent some combinations which don't make sense, e.g. accidentally exporting this kind of volume or creating a _path_precache. (Setting revisions_to_keep to -1 should delete the precache and prune existing revisions after renaming* _path_clean to _path_dirty, so that it fails early in case of a severely broken volume with a missing _path_clean file.)

It would also keep _path_dirty = path = "the thing that's connected to Xen if the VM is running" constant in all cases. In lvm_thin, I always find the indirection in path/_vid_current and block_device a bit confusing.


* Edit: Or rather hardlinking _path_clean to _path_dirty and then removing _path_clean, so there's no chance of accidentally overwriting an existing _path_dirty

@rustybird
Copy link
Contributor

rustybird commented Jun 26, 2025

That would prevent some combinations which don't make sense, e.g. accidentally exporting this kind of volume

Sorry, I didn't get that export should be possible for a snapshot_disabled volume if it's stopped!

Okay that might be a point in favor of the way you did it, because if export() instead returned _path_dirty to a caller that doesn't immediately open the file, and then the user re-enables snapshots and starts the volume, later on the caller would use the actual live data. :(

Or is this also an issue with the current approach, but in the other direction? A volume with snapshots enabled is exported to a caller that doesn't immediately open the returned _path_clean file, and then the user disables snapshots and starts the volume.

Another reason to make export() return an open file handle - instead of a file name, which is inherently racy.

Maybe it doesn't even matter in practice, if all callers (even backup) of export() open the file quickly enough?

@DemiMarie
Copy link
Contributor

Making export() export a handle is definitely a good idea.

@marmarek
Copy link
Member

Making export() export a handle is definitely a good idea.

There is a partial PR in that direction: #462 and also #461, but none of them are complete.

Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've read the whole thing, and it looks pretty solid! I have just one minor request below.

I'll do some testing now...

" <-- The VM is running, backup will contain "
"its state from before its start!"
" <-- The VM is running and private volume snapshots "
"are disabled. Backup will fail!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash this commit into "backup: prevent backup of running VM when private volume snapshots are disabled"?

@marmarek
Copy link
Member

With LVM pool, I was allowed to clone a running AppVM that has revisions_to_keep=-1 on its private volume...

@Guiiix
Copy link
Member Author

Guiiix commented Jul 1, 2025

Thank you for all your feedbacks!

(Setting revisions_to_keep to -1 should delete the precache and prune existing revisions

Ah yes, I have to add this, thanks!

after renaming* _path_clean to _path_dirty, so that it fails early in case of a severely broken volume with a missing _path_clean file.)

The volume would be considered dirty even when stopped. This is not important?

With LVM pool, I was allowed to clone a running AppVM that has revisions_to_keep=-1 on its private volume...

Sorry about that, I did the check in quesadmin but this is not relevant. Now that we have volumes state files, I can do it in qubesd.

Guiiix added 14 commits July 1, 2025 18:30
If enabled, COW image is not created/commit/destroyed anymore.
Original private.img file is provided to libvirt (no snapshot).

# Conflicts:
#	qubes/storage/file.py
If enabled, all LVM snapshots are removed and the private volume is
used directly provided to libvirt.

# Conflicts:
#	qubes/storage/lvm.py
Disables snapshots of a private reflink volume if enabled.

# Conflicts:
#	qubes/storage/reflink.py
This can lead to unexpected behavior or even data loss

# Conflicts:
#	qubes/tests/api_admin.py
…e disabled

Backup a running volume is prohibited

# Conflicts:
#	qubes/backup.py
#	qubes/tests/api_admin.py
# Conflicts:
#	qubes/tests/api_admin.py
This allows to track if a volume is currently started or not without
having to check the related VM state.

# Conflicts:
#	qubes/storage/__init__.py
This allows to track if a volume is currently started or not without
having to check the related VM state.

# Conflicts:
#	qubes/storage/__init__.py
For exemple, if a DispVM has a TemplateVM having a private volume with
snapshots disabled, startup must be denied.
A TemplateVM cannot be started when having snapshots disabled on its
private volume and at least one DispVM running

# Conflicts:
#	qubes/vm/mix/dvmtemplate.py
…vm is running

# Conflicts:
#	qubes/tests/api_admin.py
@Guiiix Guiiix force-pushed the disable_snapshots branch from 8c34954 to a3adbac Compare July 1, 2025 16:33
@rustybird
Copy link
Contributor

rustybird commented Jul 1, 2025

The volume would be considered dirty even when stopped. This is not important?

It's not surfaced to the user, in fact nothing except for the individual storage drivers themselves ever calls is_dirty().

To me it seems like the abstract is_dirty() storage driver API method ought to be deleted from storage/__init__.py and storage/callback.py: Starting qubesd will already clobber the is_dirty status by stopping all volumes, so there's even less point in defining in a vacuum what exactly dirty means for every type of volume. (In file-reflink, it's probably clearest to inline the logic and delete the method there too?)

@Guiiix Guiiix force-pushed the disable_snapshots branch from 86715b2 to 0a481ce Compare July 2, 2025 18:10
@Guiiix
Copy link
Member Author

Guiiix commented Jul 2, 2025

With the last commit, the file is renamed -dirty at startup and -clean at shutdown. In this way, the dirty volume is the one connected to Xen, but when the VM is not running, we have a clean volume just like when revisions_to_keep is >= 0. Are you okay with this?

@Guiiix Guiiix force-pushed the disable_snapshots branch from 0a481ce to cd17f94 Compare July 2, 2025 18:33
@rustybird
Copy link
Contributor

rustybird commented Jul 3, 2025

Nice, that actually feels like the most natural way to fit in the feature.

Then _update_precache() should not call _copy_file() if snapshots_disabled, so resizing a volume won't create a precache only to remove it again on start. Oh you already did that too. Perfect! The file-reflink changes LGTM (untested, I'm still on R4.2)


async def import_volume(self, dst_volume, src_volume):
"""Helper function to import data from another volume"""
src_volume = self.get_volume(src_volume)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source volume for import usually is from another VM, so get_volume() won't work if you give it just a name. You can either require also source vm object in arguments, or simply require Volume object and don't support name in src_volume arg - in which case the get_volume call is unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. That makes sense, thank you!

@Guiiix Guiiix force-pushed the disable_snapshots branch from cd17f94 to f0b01f6 Compare July 3, 2025 16:23
@Guiiix
Copy link
Member Author

Guiiix commented Jul 3, 2025

Nice, that actually feels like the most natural way to fit in the feature.

Then _update_precache() should not call _copy_file() if snapshots_disabled, so resizing a volume won't create a precache only to remove it again on start. Oh you already did that too. Perfect! The file-reflink changes LGTM (untested, I'm still on R4.2)

Thank you, if you want to try it on R4.2, you can use this branch:
https://github.com/Guiiix/qubes-core-admin/tree/disable_snapshot_4.2

I am on R4.2 too :)

@qubesos-bot
Copy link

qubesos-bot commented Jul 3, 2025

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2025070405-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2025061004-4.3&flavor=update

  • system_tests_pvgrub_salt_storage

  • system_tests_kde_gui_interactive

    • kde_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/1YA~i-\d+-/...

    • kde_install: Failed (test died + timed out)
      # Test died: command '(set -o pipefail; sudo qubes-dom0-update -y k...

  • system_tests_guivm_vnc_gui_interactive

    • guivm_startup: unnamed test (unknown)
    • guivm_startup: Failed (test died)
      # Test died: no candidate needle with tag(s) 'desktop' matched...
  • system_tests_qwt_win10_seamless@hw13

    • windows_clipboard_and_filecopy: unnamed test (unknown)
    • windows_clipboard_and_filecopy: Failed (test died)
      # Test died: no candidate needle with tag(s) 'windows-Edge-address-...
  • system_tests_qwt_win11@hw13

    • windows_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/dcWzE-\d+-/...

    • windows_install: Failed (test died + timed out)
      # Test died: command 'script -e -c 'bash -x /usr/bin/qvm-create-win...

  • system_tests_basic_vm_qrexec_gui_xfs

    • TC_20_NonAudio_whonix-gateway-17-pool: test_012_qubes_desktop_run (error)
      subprocess.CalledProcessError: Command 'qubes.WaitForSession' retur...

Failed tests

15 failures
  • system_tests_pvgrub_salt_storage

  • system_tests_splitgpg

  • system_tests_extra

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      AssertionError: 1 != 0 : Timeout waiting for /dev/video0 in test-in...
  • system_tests_kde_gui_interactive

    • kde_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/1YA~i-\d+-/...

    • kde_install: Failed (test died + timed out)
      # Test died: command '(set -o pipefail; sudo qubes-dom0-update -y k...

  • system_tests_guivm_vnc_gui_interactive

    • guivm_startup: unnamed test (unknown)
    • guivm_startup: Failed (test died)
      # Test died: no candidate needle with tag(s) 'desktop' matched...
  • system_tests_qwt_win10_seamless@hw13

    • windows_clipboard_and_filecopy: unnamed test (unknown)
    • windows_clipboard_and_filecopy: Failed (test died)
      # Test died: no candidate needle with tag(s) 'windows-Edge-address-...
  • system_tests_qwt_win11@hw13

    • windows_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/dcWzE-\d+-/...

    • windows_install: Failed (test died + timed out)
      # Test died: command 'script -e -c 'bash -x /usr/bin/qvm-create-win...

  • system_tests_basic_vm_qrexec_gui_xfs

    • TC_20_NonAudio_whonix-gateway-17-pool: test_012_qubes_desktop_run (error)
      subprocess.CalledProcessError: Command 'qubes.WaitForSession' retur...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/142375#dependencies

11 fixed
  • system_tests_splitgpg

  • system_tests_extra

  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: wait_serial (wait serial expected)
      # wait_serial expected: "echo -e '[Layout]\nLayoutList=us,de' | sud...

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...

  • system_tests_guivm_vnc_gui_interactive

    • simple_gui_apps: unnamed test (unknown)
    • simple_gui_apps: Failed (test died)
      # Test died: no candidate needle with tag(s) 'vm-settings-applicati...
  • system_tests_audio

Unstable tests

Performance Tests

Performance degradation:

5 performance degradations
  • debian-12-xfce_exec-data-duplex-root: 89.40 🔺 ( previous job: 70.01, degradation: 127.70%)
  • whonix-gateway-17_socket: 9.08 🔺 ( previous job: 7.85, degradation: 115.67%)
  • whonix-workstation-17_exec-data-duplex-root: 101.77 🔺 ( previous job: 86.00, degradation: 118.33%)
  • dom0_root_rnd4k_q1t1_read 3:read_bandwidth_kb: 8613.00 :small_red_triangle: ( previous job: 11086.00, degradation: 77.69%)
  • dom0_root_rnd4k_q1t1_write 3:write_bandwidth_kb: 675.00 :small_red_triangle: ( previous job: 1840.00, degradation: 36.68%)

Remaining performance tests:

67 tests
  • debian-12-xfce_exec: 8.04 🟢 ( previous job: 8.63, improvement: 93.15%)
  • debian-12-xfce_exec-root: 28.87 🟢 ( previous job: 29.44, improvement: 98.07%)
  • debian-12-xfce_socket: 8.84 🔺 ( previous job: 8.50, degradation: 103.99%)
  • debian-12-xfce_socket-root: 7.75 🟢 ( previous job: 8.31, improvement: 93.20%)
  • debian-12-xfce_exec-data-simplex: 69.47 🔺 ( previous job: 65.51, degradation: 106.04%)
  • debian-12-xfce_exec-data-duplex: 72.15 🟢 ( previous job: 73.55, improvement: 98.10%)
  • debian-12-xfce_socket-data-duplex: 163.46 🔺 ( previous job: 161.35, degradation: 101.31%)
  • fedora-42-xfce_exec: 9.06
  • fedora-42-xfce_exec-root: 58.04
  • fedora-42-xfce_socket: 8.13
  • fedora-42-xfce_socket-root: 8.35
  • fedora-42-xfce_exec-data-simplex: 66.32
  • fedora-42-xfce_exec-data-duplex: 74.35
  • fedora-42-xfce_exec-data-duplex-root: 110.92
  • fedora-42-xfce_socket-data-duplex: 148.60
  • whonix-gateway-17_exec: 7.91 🔺 ( previous job: 7.34, degradation: 107.78%)
  • whonix-gateway-17_exec-root: 42.06 🔺 ( previous job: 39.57, degradation: 106.28%)
  • whonix-gateway-17_socket-root: 8.01 🔺 ( previous job: 7.89, degradation: 101.48%)
  • whonix-gateway-17_exec-data-simplex: 77.58 🟢 ( previous job: 77.76, improvement: 99.76%)
  • whonix-gateway-17_exec-data-duplex: 77.39 🟢 ( previous job: 78.39, improvement: 98.73%)
  • whonix-gateway-17_exec-data-duplex-root: 99.31 🔺 ( previous job: 90.74, degradation: 109.45%)
  • whonix-gateway-17_socket-data-duplex: 168.37 🔺 ( previous job: 161.95, degradation: 103.97%)
  • whonix-workstation-17_exec: 7.96 🟢 ( previous job: 8.27, improvement: 96.19%)
  • whonix-workstation-17_exec-root: 56.28 🟢 ( previous job: 57.61, improvement: 97.70%)
  • whonix-workstation-17_socket: 9.47 🔺 ( previous job: 8.97, degradation: 105.61%)
  • whonix-workstation-17_socket-root: 8.27 🟢 ( previous job: 9.46, improvement: 87.46%)
  • whonix-workstation-17_exec-data-simplex: 65.28 🟢 ( previous job: 74.54, improvement: 87.58%)
  • whonix-workstation-17_exec-data-duplex: 77.86 🔺 ( previous job: 74.84, degradation: 104.04%)
  • whonix-workstation-17_socket-data-duplex: 159.67 🟢 ( previous job: 160.20, improvement: 99.67%)
  • dom0_root_seq1m_q8t1_read 3:read_bandwidth_kb: 475760.00 :green_circle: ( previous job: 289982.00, improvement: 164.07%)
  • dom0_root_seq1m_q8t1_write 3:write_bandwidth_kb: 215845.00 :green_circle: ( previous job: 101988.00, improvement: 211.64%)
  • dom0_root_seq1m_q1t1_read 3:read_bandwidth_kb: 96904.00 :green_circle: ( previous job: 14284.00, improvement: 678.41%)
  • dom0_root_seq1m_q1t1_write 3:write_bandwidth_kb: 45227.00 :green_circle: ( previous job: 32696.00, improvement: 138.33%)
  • dom0_root_rnd4k_q32t1_read 3:read_bandwidth_kb: 21250.00 :green_circle: ( previous job: 17102.00, improvement: 124.25%)
  • dom0_root_rnd4k_q32t1_write 3:write_bandwidth_kb: 3061.00 :green_circle: ( previous job: 1091.00, improvement: 280.57%)
  • dom0_varlibqubes_seq1m_q8t1_read 3:read_bandwidth_kb: 464177.00 :green_circle: ( previous job: 289182.00, improvement: 160.51%)
  • dom0_varlibqubes_seq1m_q8t1_write 3:write_bandwidth_kb: 117701.00 :small_red_triangle: ( previous job: 122848.00, degradation: 95.81%)
  • dom0_varlibqubes_seq1m_q1t1_read 3:read_bandwidth_kb: 438367.00 :green_circle: ( previous job: 433654.00, improvement: 101.09%)
  • dom0_varlibqubes_seq1m_q1t1_write 3:write_bandwidth_kb: 151628.00 :small_red_triangle: ( previous job: 167872.00, degradation: 90.32%)
  • dom0_varlibqubes_rnd4k_q32t1_read 3:read_bandwidth_kb: 103480.00 :small_red_triangle: ( previous job: 108760.00, degradation: 95.15%)
  • dom0_varlibqubes_rnd4k_q32t1_write 3:write_bandwidth_kb: 8902.00 :green_circle: ( previous job: 8874.00, improvement: 100.32%)
  • dom0_varlibqubes_rnd4k_q1t1_read 3:read_bandwidth_kb: 8277.00 :green_circle: ( previous job: 6356.00, improvement: 130.22%)
  • dom0_varlibqubes_rnd4k_q1t1_write 3:write_bandwidth_kb: 4528.00 :green_circle: ( previous job: 4420.00, improvement: 102.44%)
  • fedora-42-xfce_root_seq1m_q8t1_read 3:read_bandwidth_kb: 367019.00
  • fedora-42-xfce_root_seq1m_q8t1_write 3:write_bandwidth_kb: 298739.00
  • fedora-42-xfce_root_seq1m_q1t1_read 3:read_bandwidth_kb: 327475.00
  • fedora-42-xfce_root_seq1m_q1t1_write 3:write_bandwidth_kb: 122488.00
  • fedora-42-xfce_root_rnd4k_q32t1_read 3:read_bandwidth_kb: 83504.00
  • fedora-42-xfce_root_rnd4k_q32t1_write 3:write_bandwidth_kb: 4582.00
  • fedora-42-xfce_root_rnd4k_q1t1_read 3:read_bandwidth_kb: 7771.00
  • fedora-42-xfce_root_rnd4k_q1t1_write 3:write_bandwidth_kb: 1280.00
  • fedora-42-xfce_private_seq1m_q8t1_read 3:read_bandwidth_kb: 382134.00
  • fedora-42-xfce_private_seq1m_q8t1_write 3:write_bandwidth_kb: 255127.00
  • fedora-42-xfce_private_seq1m_q1t1_read 3:read_bandwidth_kb: 316981.00
  • fedora-42-xfce_private_seq1m_q1t1_write 3:write_bandwidth_kb: 62677.00
  • fedora-42-xfce_private_rnd4k_q32t1_read 3:read_bandwidth_kb: 44045.00
  • fedora-42-xfce_private_rnd4k_q32t1_write 3:write_bandwidth_kb: 2658.00
  • fedora-42-xfce_private_rnd4k_q1t1_read 3:read_bandwidth_kb: 8415.00
  • fedora-42-xfce_private_rnd4k_q1t1_write 3:write_bandwidth_kb: 784.00
  • fedora-42-xfce_volatile_seq1m_q8t1_read 3:read_bandwidth_kb: 416763.00
  • fedora-42-xfce_volatile_seq1m_q8t1_write 3:write_bandwidth_kb: 112648.00
  • fedora-42-xfce_volatile_seq1m_q1t1_read 3:read_bandwidth_kb: 319395.00
  • fedora-42-xfce_volatile_seq1m_q1t1_write 3:write_bandwidth_kb: 78328.00
  • fedora-42-xfce_volatile_rnd4k_q32t1_read 3:read_bandwidth_kb: 44282.00
  • fedora-42-xfce_volatile_rnd4k_q32t1_write 3:write_bandwidth_kb: 4307.00
  • fedora-42-xfce_volatile_rnd4k_q1t1_read 3:read_bandwidth_kb: 7956.00
  • fedora-42-xfce_volatile_rnd4k_q1t1_write 3:write_bandwidth_kb: 1667.00

@marmarek marmarek merged commit f79551c into QubesOS:main Jul 4, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants