CA-414627: increase polling duration for tapdisk (#6611)

edwintorok · web-flow · commit ebeb71783c24 · 2025-08-04T10:22:27.000Z
Polling mode is a performance optimization in tapdisk, where it would
busy poll the PV ring for the next request, instead of going to sleep
and waiting for an event channel to wake it up when the next request
arrives. When the guest supplies us with a steady sequence of IO
requests this is quicker (it avoids a lot of syscalls), at the expense
of consuming more Dom0 CPU.
Once polling mode is entered if there are no requests for
polling-duration microseconds, then polling mode is exited.

The wakeup granularity is 4ms on the 4.19 kernel (which also matches
CONFIG_HZ=250).
The default polling duration is 1ms, but we may not be able to reliably
handle such low timeouts.
On newer kernels we seem to hit the 1ms polling duration timeout all the
time, which results in tapdisk entering and exiting polling mode very
often. This is in fact a lot slower than just staying in event mode, or
permanently staying in polling mode when measured with a Phoronix SQLite
benchmark.

Testing showed that a 6ms value results in good benchmark results (~40%
faster than the default event based), and a 3ms value in bad benchmark
results (50% slower than event based).

Increase the default polling duration to be a multiple of the wakeup
granularity on 4.19, which should work with both 4.19 kernels and newer.

Eventually we should measure what is the optimal value where the
overhead of entering/exiting polling mode is no longer greater than
staying in event mode. (This is similar to how entering and exiting C
modes need to take into account enter and latencies, except in this case
those latencies are not known).
diff --git a/ocaml/xapi/xapi_globs.ml b/ocaml/xapi/xapi_globs.ml
@@ -741,7 +741,7 @@ let ha_default_timeout_base = ref 60.
 let guest_liveness_timeout = ref 300.
 
 (** The default time, in µs, in which tapdisk3 will keep polling the vbd ring buffer in expectation for extra requests from the guest *)
-let default_vbd3_polling_duration = ref 1000
+let default_vbd3_polling_duration = ref 8000
 
 (** The default % of idle dom0 cpu above which tapdisk3 will keep polling the vbd ring buffer *)
 let default_vbd3_polling_idle_threshold = ref 50
diff --git a/scripts/xapi.conf b/scripts/xapi.conf
@@ -329,7 +329,7 @@ sm-plugins=ext nfs iscsi lvmoiscsi dummy file hba rawhba udev iso lvm lvmohba lv
 # ha_monitor_interval = 20
 
 # Unconditionally replan every once in a while just in case the overcommit
-# protection is buggy and we don't notice 
+# protection is buggy and we don't notice
 # ha_monitor_plan_interval = 1800
 
 # ha_monitor_startup_timeout = 1800
@@ -371,7 +371,7 @@ sm-plugins=ext nfs iscsi lvmoiscsi dummy file hba rawhba udev iso lvm lvmohba lv
 
 # The default time, in µs, in which tapdisk3 will keep polling the
 # vbd ring buffer in expectation for extra requests from the guest
-# default-vbd3-polling-duration = 1000
+# default-vbd3-polling-duration = 8000
 
 # The default % of idle dom0 cpu above which tapdisk3 will keep polling
 # the vbd ring buffer
@@ -386,7 +386,7 @@ sm-plugins=ext nfs iscsi lvmoiscsi dummy file hba rawhba udev iso lvm lvmohba lv
 # evacuation-batch-size = 10
 # number of VMs migrated in parallel in Host.evacuate
 
-# How often tracing will export spans to endpoints 
+# How often tracing will export spans to endpoints
 # export-interval = 30.
 
 # The file to check if host reboot required