Skip to content

Conversation

stephenchengCloud
Copy link
Collaborator

@stephenchengCloud stephenchengCloud commented Sep 14, 2025

Note that this is a draft PR for initial review, though I've tested it with both dom0 and vm console, as:

  1. This change is substantial and requires unit tests
  2. Before investing time in comprehensive unit tests, I want to ensure the implementation approach is generally correct
  3. Added logs for testing and will be removed

Tested:
image

Key Features

  • RFB_msg_type_parser Module
    -- A simple parser: For each newly received data, only parses the message type to detect user activity
    -- Activity detection: Identifies KeyEvent (4) and PointerEvent (5) messages as user activity
    -- Unknown message handling: Skips parsing for unknown message types and continues processing

  • Console_idle_monitor Module
    -- Monitors console sessions for idle timeout based on configuration, using RFB_msg_type_parser for activity detection

  • Integration
    -- Hooks into Unixext.proxy via a closure callback mechanism
    -- No breaking changes to existing console functionalit

@stephenchengCloud stephenchengCloud marked this pull request as draft September 14, 2025 10:45
@stephenchengCloud stephenchengCloud force-pushed the private/stephenche/CP-54393 branch from 11f30ba to 8f35a91 Compare September 14, 2025 10:47
@stephenchengCloud stephenchengCloud force-pushed the private/stephenche/CP-54393 branch from 8f35a91 to 0673030 Compare September 15, 2025 00:24
Key Features
- RFB_msg_type_parser Module
A simple parser: For each newly received data, only parses the message type to detect user activity
Activity detection: Identifies KeyEvent (4) and PointerEvent (5) messages as user activity
Unknown message handling: Skips parsing for unknown message types and continues processing

- Console_idle_monitor Module
Monitors console sessions for idle timeout based on configuration, using RFB_msg_type_parser for activity detection

- Integration
Hooks into Unixext.proxy via a closure callback mechanism
No breaking changes to existing console functionalit

Signed-off-by: Stephen Cheng <[email protected]>
@stephenchengCloud stephenchengCloud force-pushed the private/stephenche/CP-54393 branch from 0673030 to 94b5eeb Compare September 15, 2025 01:00
; poll_event_timeout_ms= -1 (* wait indefinitely *)
}

let proxy ?(console_init_state = default_vnc_console_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this module is not the place to put any code with knowledge of a VNC console

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy needs to get the configs, at least, is_idle_timeout_set, because we don't want to change the behaviour when idle timeout is not set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then proxy should be elsewhere, or made generic it does not care about vnc at all so it can be kept here. VNC is not a unix concept

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this function needs any state. I was expecting a new parameter of a function that takes newly read data as an argument and returns a Boolean that indicates whether to hang up or not. All the other details can be handled by the console-specific code. In other words, this proxy function should not know anything about "console", "vnc" or "idle". The polly timeout could also be a plain parameter of proxy (not inside a record).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both for the comments.

  1. I'll make polly timeout a separate parameter for proxy
  2. I'll make the callback empty if the idle_timeout is not set, so the behaviours keep unchanged as before.

until an event occurs.
If set to a positive value, the poll will wake up periodically,
which is useful for implementing features like idle timeout or periodic inspection of proxy buffers. *)
let proxy_poll_event_timeout_ms = ref 5000
Copy link
Member

@psafont psafont Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing milliseconds to users means exposing yet another time duration in a a non-standard unit, use Mtime.t (In the CLI ShortDurationFromSeconds) so users can input fractional values in seconds and retain a consistent interface. The value can be converted to milliseconds at the point the value is fed to the Polly library

let oc = open_out_gen [Open_creat; Open_text; Open_append] 0o644 file in
let time_str =
let tm = Unix.localtime (Unix.time ()) in
Printf.sprintf "%04d-%02d-%02d %02d:%02d:%02d" (tm.tm_year + 1900)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Ptime.to_rfc3339

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log will be removed like I mentioned in the commit message.

; ( "proxy_poll_event_timeout_ms"
, Arg.Set_int proxy_poll_event_timeout_ms
, (fun () -> string_of_int !proxy_poll_event_timeout_ms)
, "Timeout (in milliseconds) for event polling in the proxy loop."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who are we expecting to understand this? I assume this is mostly for developers. This name needs to reflect what proxy we we talking about - the connection with VNC is missing.

; Unixext.poll_event_timeout_ms
}
in
log_in_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to use non-standard logging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log is for testing and will be removed.

let proxy (a : Unix.file_descr) (b : Unix.file_descr) =
exception Idle_timeout

type vnc_console_state = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified to avoid ambiguous state?

  • use an option to encode the timeout
  • instead of is idle, remember the timestamp of last activity. Then it is idle, if this is longer than a given timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move these state to console.ml.

if read = 0 then x.r_closed <- true ;
x.len <- x.len + read

(* Read data from the fd to the buffer and return the newly read data *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in some way re-implementing buffered IO as it is available from in_chan? Unbuffered IO (Unix.read) already reads what is available into a buffer and returns the number of bytes it has read. What problem is this code solving?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
My intention was to implement a separate function that both read and return the data for idle_time_out_callback.
Given your and Rob's comments, yes, better to modify the existing read to return buffer and offset.

( if start_pos + newly_read <= buffer_size then
(* No wraparound - single blit *)
Bytes.blit x.buffer start_pos result 0 newly_read
else (* Wraparound - two blits *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at read, I don't think it will wrap around, due to the way len is defined. And it only does a single Unix.read.

Given that, I think that this function should just return a substring of x.buffer with Bytes.sub. Even better might be to avoid copying all the bytes and just return buffer + offset + len. And in that case, you could just return that from read directly (no need for a separate read_and_return function).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's much better to re-use the exsiting read and return buffer + offset.

; poll_event_timeout_ms= -1 (* wait indefinitely *)
}

let proxy ?(console_init_state = default_vnc_console_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this function needs any state. I was expecting a new parameter of a function that takes newly read data as an argument and returns a Boolean that indicates whether to hang up or not. All the other details can be handled by the console-specific code. In other words, this proxy function should not know anything about "console", "vnc" or "idle". The polly timeout could also be a plain parameter of proxy (not inside a record).

(fun _polly fd events () ->
(* Do the writing before the reading *)
if Polly.Events.(test out events) then
if a = fd then CBuf.write b' a else CBuf.write a' b ;
Copy link
Member

@minglumlu minglumlu Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to call the stateful checker here so that we don't need to handle the buffer in unixext.ml at all.
E.g.

if a = fd then CBuf.write b' a
else (
  if in_check a' then CBuf.write a' b else raise In_check_error
) ;

Note the in_check must not change the buffer a'.
By default, the in_check could be Fn.const true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
We must make sure the buffer isn't modified by the checker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it again, this approach doesn't work. The reason is that we use the buf's start, len value in the checker to parse data that hasn't been sent to the server yet. However, after the write operation completes, it may not have written all the buffer data to the server. So when the checker checks again next time, we would be parsing the remain data from before repeatedly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Indeed, the write may not get all data in buffer sent out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow Rob's suggestion, make the read return extra offset and len for the bytes read this time.

This prevents the parser from getting stuck on protocol variations while
ensuring we can still achieve our primary goal: detecting KeyEvent and
PointerEvent messages for activity monitoring. *)
let progress_handshake state data_len offset =
Copy link
Member

@minglumlu minglumlu Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Handshake flow (and maybe Security, ClientInit and other messages, even the length-variable ones) could be encoded with angstrom. The lib is quite handy. We can write a parser for each type of messages. And it seems that the angstrom can handle the cases like Partial, Done and Fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have to skip over the other messages. I have not looked at this in detail but decoding only some messages will only be enough if ignoring the others by knowing their length is possible without fully decoding them.

`Skip_All

(* Different RFB versions have variations in their handshake protocols,
so we implement a simplified handshake parser that handles the most common cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least we should distinguish the supported and unsupported versions and turn off the feature for those unsupported versions.

let has_activity =
List.exists
(fun msg_type -> msg_type = 4 || msg_type = 5)
simple_rfb_parser.received_msg_types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic could be as simple (two lines maybe) as:

  1. refresh the state with a newMtime_clock.counter () when parsed a KeyEvent or PointerEven message.
  2. after all parsing in one callback call, check the elapsed time with Mtime_clock.count against the idle timeout config.

module RFB_msg_type_parser = struct
type state = {
mutable rfb_phase: [`Handshake | `Security | `ClientInit | `Messages]
; mutable incomplete_msg_remaining: int
Copy link
Member

@minglumlu minglumlu Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be bytes for several following messages. So it could be called as unconsumed bytes. And I think here is the right place to store them. But it's better to store only the ones which can't be consumed within one call.

let pool = Helpers.get_pool ~__context in
Db.Pool.get_vm_console_idle_timeout ~__context ~self:pool
in
let poll_event_timeout_ms =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in proxy rather than console/vnc.

(* https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst *)
module RFB_msg_type_parser = struct
type state = {
mutable rfb_phase: [`Handshake | `Security | `ClientInit | `Messages]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use variants instead of polymorphic variants?
And I believe there is one phase No_parsing for the case that parsing failed. It's impossible to recover. The session should be exempted from the idle timeout checking.

@minglumlu
Copy link
Member

The parsing of RFB protocol can and shall be covered by unit test.

@stephenchengCloud
Copy link
Collaborator Author

The parsing of RFB protocol can and shall be covered by unit test.

As I mentioned in the commit message, I was planning to write unit tests from the beginning. I just wanted you all to take a preliminary look at whether the approach is feasible before writing them.
How about this: I'll first write the parser module and the parser's unit tests, and submit them as a separate PR. Then I'll handle the idle_timeout logic afterwards. This way it will also be easier for you to review.
@minglumlu @robhoes @lindig @psafont

@lindig
Copy link
Contributor

lindig commented Sep 17, 2025

I am fine with this. I had this other idea that we could also use a heuristic: a connection is idle when the amount of data flowing from the client to VNC is below a value; and if this happens for the timeout period, we cut it. Just to avoid dealing with the protocol.

@robhoes
Copy link
Member

robhoes commented Sep 17, 2025

is idle when the amount of data flowing from the client to VNC is below a value;

Does an "idle" connection have a very predictable rate of incoming packets, e.g. request every x seconds? Could we set a threshold just above that? I guess it depends on how much traffic a non-idle but lightly used connection is expected to generate, as we don't want to cut off too eagerly, but can't set the threshold too low either

@stephenchengCloud
Copy link
Collaborator Author

From what I observed, the data flow rate from the client depends on the applications running on the client:

  • Case 1: A windows 10, runnig an application that needs to show the current time (2025-09-18 03:25:16)
image

Then the data from the client to the server is like this, updated every second:
Msg Type=3 is FramebufferUpdateRequest

[2025-09-18 03:25:13] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:14] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:15] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:16] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:17] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:22] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:23] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:24] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:25] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:26] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:32] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
  • Case 2: Then I closed the applications and left only an empty desktop.
    I noticed there's a clock updated every minute in the corner.
image

Then the data from the client to the server is like this, not updated every second. But it's unpredictable.

[2025-09-18 03:25:54] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:25:55] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:26:00] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:26:00] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:26:46] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:26:49] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:27:01] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:27:49] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:27:54] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:28:00] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:28:52] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:29:00] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:29:00] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:29:56] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:30:01] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:30:04] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:30:58] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:31:01] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:31:09] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:32:02] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:32:14] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:33:02] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:33:04] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:33:20] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:34:02] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
[2025-09-18 03:34:25] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 4; 0; 3; 0]
  • Case 3: A dom0 console showing an empty shell.
    The data is like this, updated every 6 seconds.
[2025-09-18 02:50:29] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:50:35] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:50:41] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:50:48] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:50:54] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:51:00] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:51:06] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:51:13] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:51:19] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:51:25] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]
[2025-09-18 02:51:32] [proxy] Read 10 bytes, Msg Type=3, bytes=[3; 1; 0; 0; 0; 0; 2; 128; 1; 128]

I believe the data rate is also rely on the network latency and dom0's workload.
Therefore, it's hard to find a proper threshold.

Let's write a simple rfb parser first. From the data I observed, the message sent from the client to the server is limited to a few types, so it shouldn't be difficult to handle.

@stephenchengCloud
Copy link
Collaborator Author

RFB parser is ready to review: #6679

Close this PR for now, will reopen it after RFB parser is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants