-
Notifications
You must be signed in to change notification settings - Fork 292
CP-54393: VNC Console Idle Timeout #6667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-54393: VNC Console Idle Timeout #6667
Conversation
11f30ba
to
8f35a91
Compare
8f35a91
to
0673030
Compare
Key Features - RFB_msg_type_parser Module A simple parser: For each newly received data, only parses the message type to detect user activity Activity detection: Identifies KeyEvent (4) and PointerEvent (5) messages as user activity Unknown message handling: Skips parsing for unknown message types and continues processing - Console_idle_monitor Module Monitors console sessions for idle timeout based on configuration, using RFB_msg_type_parser for activity detection - Integration Hooks into Unixext.proxy via a closure callback mechanism No breaking changes to existing console functionalit Signed-off-by: Stephen Cheng <[email protected]>
0673030
to
94b5eeb
Compare
; poll_event_timeout_ms= -1 (* wait indefinitely *) | ||
} | ||
|
||
let proxy ?(console_init_state = default_vnc_console_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this module is not the place to put any code with knowledge of a VNC console
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proxy
needs to get the configs, at least, is_idle_timeout_set
, because we don't want to change the behaviour when idle timeout is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then proxy should be elsewhere, or made generic it does not care about vnc at all so it can be kept here. VNC is not a unix concept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this function needs any state. I was expecting a new parameter of a function that takes newly read data as an argument and returns a Boolean that indicates whether to hang up or not. All the other details can be handled by the console-specific code. In other words, this proxy
function should not know anything about "console", "vnc" or "idle". The polly timeout could also be a plain parameter of proxy
(not inside a record).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both for the comments.
- I'll make polly timeout a separate parameter for
proxy
- I'll make the callback empty if the idle_timeout is not set, so the behaviours keep unchanged as before.
until an event occurs. | ||
If set to a positive value, the poll will wake up periodically, | ||
which is useful for implementing features like idle timeout or periodic inspection of proxy buffers. *) | ||
let proxy_poll_event_timeout_ms = ref 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing milliseconds to users means exposing yet another time duration in a a non-standard unit, use Mtime.t
(In the CLI ShortDurationFromSeconds
) so users can input fractional values in seconds and retain a consistent interface. The value can be converted to milliseconds at the point the value is fed to the Polly library
let oc = open_out_gen [Open_creat; Open_text; Open_append] 0o644 file in | ||
let time_str = | ||
let tm = Unix.localtime (Unix.time ()) in | ||
Printf.sprintf "%04d-%02d-%02d %02d:%02d:%02d" (tm.tm_year + 1900) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use Ptime.to_rfc3339
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log will be removed like I mentioned in the commit message.
; ( "proxy_poll_event_timeout_ms" | ||
, Arg.Set_int proxy_poll_event_timeout_ms | ||
, (fun () -> string_of_int !proxy_poll_event_timeout_ms) | ||
, "Timeout (in milliseconds) for event polling in the proxy loop." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who are we expecting to understand this? I assume this is mostly for developers. This name needs to reflect what proxy we we talking about - the connection with VNC is missing.
; Unixext.poll_event_timeout_ms | ||
} | ||
in | ||
log_in_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to use non-standard logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log is for testing and will be removed.
let proxy (a : Unix.file_descr) (b : Unix.file_descr) = | ||
exception Idle_timeout | ||
|
||
type vnc_console_state = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified to avoid ambiguous state?
- use an option to encode the timeout
- instead of is idle, remember the timestamp of last activity. Then it is idle, if this is longer than a given timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move these state to console.ml.
if read = 0 then x.r_closed <- true ; | ||
x.len <- x.len + read | ||
|
||
(* Read data from the fd to the buffer and return the newly read data *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in some way re-implementing buffered IO as it is available from in_chan
? Unbuffered IO (Unix.read
) already reads what is available into a buffer and returns the number of bytes it has read. What problem is this code solving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
My intention was to implement a separate function that both read and return the data for idle_time_out_callback.
Given your and Rob's comments, yes, better to modify the existing read
to return buffer and offset.
( if start_pos + newly_read <= buffer_size then | ||
(* No wraparound - single blit *) | ||
Bytes.blit x.buffer start_pos result 0 newly_read | ||
else (* Wraparound - two blits *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at read
, I don't think it will wrap around, due to the way len
is defined. And it only does a single Unix.read
.
Given that, I think that this function should just return a substring of x.buffer
with Bytes.sub
. Even better might be to avoid copying all the bytes and just return buffer + offset + len. And in that case, you could just return that from read
directly (no need for a separate read_and_return
function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's much better to re-use the exsiting read
and return buffer + offset.
; poll_event_timeout_ms= -1 (* wait indefinitely *) | ||
} | ||
|
||
let proxy ?(console_init_state = default_vnc_console_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this function needs any state. I was expecting a new parameter of a function that takes newly read data as an argument and returns a Boolean that indicates whether to hang up or not. All the other details can be handled by the console-specific code. In other words, this proxy
function should not know anything about "console", "vnc" or "idle". The polly timeout could also be a plain parameter of proxy
(not inside a record).
(fun _polly fd events () -> | ||
(* Do the writing before the reading *) | ||
if Polly.Events.(test out events) then | ||
if a = fd then CBuf.write b' a else CBuf.write a' b ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible to call the stateful checker here so that we don't need to handle the buffer in unixext.ml
at all.
E.g.
if a = fd then CBuf.write b' a
else (
if in_check a' then CBuf.write a' b else raise In_check_error
) ;
Note the in_check
must not change the buffer a'
.
By default, the in_check
could be Fn.const true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
We must make sure the buffer isn't modified by the checker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about it again, this approach doesn't work. The reason is that we use the buf's start
, len
value in the checker to parse data that hasn't been sent to the server yet. However, after the write operation completes, it may not have written all the buffer data to the server. So when the checker checks again next time, we would be parsing the remain data from before repeatedly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Indeed, the write
may not get all data in buffer sent out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow Rob's suggestion, make the read
return extra offset
and len
for the bytes read this time.
This prevents the parser from getting stuck on protocol variations while | ||
ensuring we can still achieve our primary goal: detecting KeyEvent and | ||
PointerEvent messages for activity monitoring. *) | ||
let progress_handshake state data_len offset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Handshake flow (and maybe Security, ClientInit and other messages, even the length-variable ones) could be encoded with angstrom. The lib is quite handy. We can write a parser for each type of messages. And it seems that the angstrom can handle the cases like Partial
, Done
and Fail
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have to skip over the other messages. I have not looked at this in detail but decoding only some messages will only be enough if ignoring the others by knowing their length is possible without fully decoding them.
`Skip_All | ||
|
||
(* Different RFB versions have variations in their handshake protocols, | ||
so we implement a simplified handshake parser that handles the most common cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least we should distinguish the supported and unsupported versions and turn off the feature for those unsupported versions.
let has_activity = | ||
List.exists | ||
(fun msg_type -> msg_type = 4 || msg_type = 5) | ||
simple_rfb_parser.received_msg_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic could be as simple (two lines maybe) as:
- refresh the state with a new
Mtime_clock.counter ()
when parsed aKeyEvent
orPointerEven
message. - after all parsing in one callback call, check the elapsed time with
Mtime_clock.count
against the idle timeout config.
module RFB_msg_type_parser = struct | ||
type state = { | ||
mutable rfb_phase: [`Handshake | `Security | `ClientInit | `Messages] | ||
; mutable incomplete_msg_remaining: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be bytes for several following messages. So it could be called as unconsumed bytes. And I think here is the right place to store them. But it's better to store only the ones which can't be consumed within one call.
let pool = Helpers.get_pool ~__context in | ||
Db.Pool.get_vm_console_idle_timeout ~__context ~self:pool | ||
in | ||
let poll_event_timeout_ms = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in proxy rather than console/vnc.
(* https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst *) | ||
module RFB_msg_type_parser = struct | ||
type state = { | ||
mutable rfb_phase: [`Handshake | `Security | `ClientInit | `Messages] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use variants
instead of polymorphic variants
?
And I believe there is one phase No_parsing
for the case that parsing failed. It's impossible to recover. The session should be exempted from the idle timeout checking.
The parsing of RFB protocol can and shall be covered by unit test. |
As I mentioned in the commit message, I was planning to write unit tests from the beginning. I just wanted you all to take a preliminary look at whether the approach is feasible before writing them. |
I am fine with this. I had this other idea that we could also use a heuristic: a connection is idle when the amount of data flowing from the client to VNC is below a value; and if this happens for the timeout period, we cut it. Just to avoid dealing with the protocol. |
Does an "idle" connection have a very predictable rate of incoming packets, e.g. request every x seconds? Could we set a threshold just above that? I guess it depends on how much traffic a non-idle but lightly used connection is expected to generate, as we don't want to cut off too eagerly, but can't set the threshold too low either |
From what I observed, the data flow rate from the client depends on the applications running on the client:
![]() Then the data from the client to the server is like this, updated every second:
![]() Then the data from the client to the server is like this, not updated every second. But it's unpredictable.
I believe the data rate is also rely on the network latency and dom0's workload. Let's write a simple rfb parser first. From the data I observed, the message sent from the client to the server is limited to a few types, so it shouldn't be difficult to handle. |
RFB parser is ready to review: #6679 Close this PR for now, will reopen it after RFB parser is merged. |
Note that this is a draft PR for initial review, though I've tested it with both dom0 and vm console, as:
Tested:

Key Features
RFB_msg_type_parser Module
-- A simple parser: For each newly received data, only parses the message type to detect user activity
-- Activity detection: Identifies KeyEvent (4) and PointerEvent (5) messages as user activity
-- Unknown message handling: Skips parsing for unknown message types and continues processing
Console_idle_monitor Module
-- Monitors console sessions for idle timeout based on configuration, using RFB_msg_type_parser for activity detection
Integration
-- Hooks into Unixext.proxy via a closure callback mechanism
-- No breaking changes to existing console functionalit