Skip to content

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented Sep 4, 2025

Usage

This is an optional new command line option to use with --connection.

--connection-timeout <timeout>    When using --connection, the number of seconds to
        wait for new connections after the server has started and after all clients
        have disconnected. Each new connection will reset the timeout. When the
        timeout is reached, the server will be closed and the process will exit.
        Not specifying this argument or specifying non-positive values will cause
        the server to wait for new connections indefinitely.

A corresponding extension setting Connection Timeout is added to the lldb-dap VS Code extension.

Benefits

Automatic release of resources when lldb-dap is no longer being used (e.g. release memory used by module cache).

Test

Added 3 API tests.

$ ninja lldb-api-test-deps
$ bin/llvm-lit -a ../llvm-project/lldb/test/API/tools/lldb-dap/server/TestDAP_server.py
...
Ran 6 tests in 24.928s
...
Total Discovered Tests: 1
  Passed: 1 (100.00%)

Before adding the tests, there were already 3 tests and they took 11.7s to run.

Ran 3 tests in 11.703s

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-lldb

Author: Roy Shi (royitaqi)

Changes

Usage

--time-to-live &lt;ttl&gt;    When using --connection, the number of milliseconds to wait for new connections
                        at the beginning and after all clients have disconnected. Not specifying this
                        argument or specifying non-positive values will wait indefinitely.

Benefits

Automatic release of resources when lldb-dap is no longer being used (e.g. release memory used by module cache).

Test

TBD. I will find a test file to add tests.


Full diff: https://github.com/llvm/llvm-project/pull/156803.diff

2 Files Affected:

  • (modified) lldb/tools/lldb-dap/Options.td (+7)
  • (modified) lldb/tools/lldb-dap/tool/lldb-dap.cpp (+51-2)
diff --git a/lldb/tools/lldb-dap/Options.td b/lldb/tools/lldb-dap/Options.td
index 867753e9294a6..754b8c7d03568 100644
--- a/lldb/tools/lldb-dap/Options.td
+++ b/lldb/tools/lldb-dap/Options.td
@@ -61,3 +61,10 @@ def pre_init_command: S<"pre-init-command">,
 def: Separate<["-"], "c">,
   Alias<pre_init_command>,
   HelpText<"Alias for --pre-init-command">;
+
+def time_to_live: S<"time-to-live">,
+      MetaVarName<"<ttl>">,
+      HelpText<"When using --connection, the number of milliseconds to wait "
+        "for new connections at the beginning and after all clients have "
+        "disconnected. Not specifying this argument or specifying "
+        "non-positive values will wait indefinitely.">;
diff --git a/lldb/tools/lldb-dap/tool/lldb-dap.cpp b/lldb/tools/lldb-dap/tool/lldb-dap.cpp
index b74085f25f4e2..8b53e4d5cda83 100644
--- a/lldb/tools/lldb-dap/tool/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/tool/lldb-dap.cpp
@@ -258,7 +258,7 @@ validateConnection(llvm::StringRef conn) {
 static llvm::Error
 serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
                 Log *log, const ReplMode default_repl_mode,
-                const std::vector<std::string> &pre_init_commands) {
+                const std::vector<std::string> &pre_init_commands, int ttl) {
   Status status;
   static std::unique_ptr<Socket> listener = Socket::Create(protocol, status);
   if (status.Fail()) {
@@ -283,6 +283,21 @@ serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
     g_loop.AddPendingCallback(
         [](MainLoopBase &loop) { loop.RequestTermination(); });
   });
+  static MainLoopBase::TimePoint ttl_time_point;
+  static std::mutex ttl_mutex;
+  if (ttl > 0) {
+    std::scoped_lock<std::mutex> lock(ttl_mutex);
+    MainLoopBase::TimePoint future =
+        std::chrono::steady_clock::now() + std::chrono::milliseconds(ttl);
+    ttl_time_point = future;
+    g_loop.AddCallback(
+        [future](MainLoopBase &loop) {
+          if (ttl_time_point == future) {
+            loop.RequestTermination();
+          }
+        },
+        future);
+  }
   std::condition_variable dap_sessions_condition;
   std::mutex dap_sessions_mutex;
   std::map<MainLoop *, DAP *> dap_sessions;
@@ -291,6 +306,12 @@ serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
                                           &dap_sessions_mutex, &dap_sessions,
                                           &clientCount](
                                              std::unique_ptr<Socket> sock) {
+    if (ttl > 0) {
+      // Reset the keep alive timer, because we won't be killing the server
+      // while this connection is being served.
+      std::scoped_lock<std::mutex> lock(ttl_mutex);
+      ttl_time_point = MainLoopBase::TimePoint();
+    }
     std::string client_name = llvm::formatv("client_{0}", clientCount++).str();
     DAP_LOG(log, "({0}) client connected", client_name);
 
@@ -327,6 +348,23 @@ serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
       std::unique_lock<std::mutex> lock(dap_sessions_mutex);
       dap_sessions.erase(&loop);
       std::notify_all_at_thread_exit(dap_sessions_condition, std::move(lock));
+
+      if (ttl > 0) {
+        // Start the countdown to kill the server at the end of each connection.
+        std::scoped_lock<std::mutex> lock(ttl_mutex);
+        MainLoopBase::TimePoint future =
+            std::chrono::steady_clock::now() + std::chrono::milliseconds(ttl);
+        // We don't need to take the max of `keep_alive_up_to` and `future`,
+        // because `future` must be the latest.
+        ttl_time_point = future;
+        g_loop.AddCallback(
+            [future](MainLoopBase &loop) {
+              if (ttl_time_point == future) {
+                loop.RequestTermination();
+              }
+            },
+            future);
+      }
     });
     client.detach();
   });
@@ -509,6 +547,17 @@ int main(int argc, char *argv[]) {
   }
 
   if (!connection.empty()) {
+    int ttl = 0;
+    llvm::opt::Arg *time_to_live = input_args.getLastArg(OPT_time_to_live);
+    if (time_to_live) {
+      llvm::StringRef time_to_live_value = time_to_live->getValue();
+      if (time_to_live_value.getAsInteger(10, ttl)) {
+        llvm::errs() << "'" << time_to_live_value
+                      << "' is not a valid time-to-live value\n";
+        return EXIT_FAILURE;
+      }
+    }
+
     auto maybeProtoclAndName = validateConnection(connection);
     if (auto Err = maybeProtoclAndName.takeError()) {
       llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(),
@@ -520,7 +569,7 @@ int main(int argc, char *argv[]) {
     std::string name;
     std::tie(protocol, name) = *maybeProtoclAndName;
     if (auto Err = serveConnection(protocol, name, log.get(), default_repl_mode,
-                                   pre_init_commands)) {
+                                   pre_init_commands, ttl)) {
       llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(),
                                   "Connection failed: ");
       return EXIT_FAILURE;

@JDevlieghere
Copy link
Member

Can you talk a bit about what's motivating this change? I'm a little worried about the proliferation of new options and I'm especially wary about the ones that need to be specified on the command line instead of over the protocol. Every new option extends the matrix of things we need to support.

  • Can this be a UI setting instead of a command line option?
  • Milliseconds seems like the wrong granularity for this . Seconds or minutes seems like a more meaningful order of magnitude.
  • This definitely needs a test. Unfortunately, tests that rely on timing tend to be less reliable, but here there should be a lower bound (e.g. exceeds X seconds) rather than an upper bound, so hopefully that should be fine.

@JDevlieghere JDevlieghere requested a review from ashgti September 4, 2025 16:28
@royitaqi
Copy link
Contributor Author

royitaqi commented Sep 4, 2025

@JDevlieghere Thanks for the good questions. I appreciate all of them. Below are my thoughts for discussion.

proliferation of new options
motivation

Main motivation is memory pressure. Other ways to counter memory pressure is either to monitor from lldb-dap VS Code extension and SIGINT for lldb-dap process to stop (drawback: IIUC this will terminate existing connections), OR use "memory pressure detected" from within lldb-dap process (drawback: the timing is tricky).

Side product is that the periodical restart will also clean up bad states that may happen.

command line option vs. over the protocol vs. VS Code settings

IIUC, there are two aspects to it: lifetime, and where to live. The following is my thoughts but no strong opinion.

Lifetime of the option: I feel this option is better for the lifetime of the lldb-dap process, because it should be consistent across different connections. Or maybe I'm missing a scenario where per-connection or per-target value works better.

I think the above decides "where does the option live". If we decide that the option is better for the whole process, then I think we need both a command line argument and a VS Code setting. If it's going to be per connection, then we need a VS Code setting.

EDIT: Did you mean that we can implement this in the lldb-dap extension instead of here in lldb-dap binary? We have use cases where we start lldb-dap binary either manually or through other scripts, so it's good to have this functionality supported into lldb-dap binary (and we can have a setting in the lldb-dap extension which passes down into this command line option).

milliseconds

Yeah I thought seconds/minutes will be more intuitive, too.

FWIW, did a calculation, max int in milliseconds is about 24.8 days. I don't think anyone would want that long of a TTL, so the range seems big enough. But yeah, seconds/minutes would probably make more sense (who is gonna use <1s TTL, right?).

Updated the patch. Now it's seconds.

This definitely needs a test

Yes, definitely (see "I will find a test file to add tests." in PR description).

Yeah probably set TTL to be a 1 second and wait for it to terminate correctly within 3 seconds.

@royitaqi
Copy link
Contributor Author

royitaqi commented Sep 4, 2025

@walter-erquinigo and @ashgti: Thank you for your review. I think most of your comments have been addressed in c78a38f and 13b1358. Feel free to reopen the comments if you want me to address them in a different way.

--

@walter-erquinigo: I will first finish the discussion with @JDevlieghere , and if we are aligned on adding a VS Code setting for the lldb-dap extension, I will implement that. At that time, I will also add to the lldb-dap documentation. SG?

BTW, I just checked, there is no such section for VS Code settings in that doc. Which section did you want me to add into?

@JDevlieghere
Copy link
Member

Main motivation is memory pressure. Other ways to counter memory pressure is either to monitor from lldb-dap VS Code extension and SIGINT for lldb-dap process to stop (drawback: IIUC this will terminate existing connections), OR use "memory pressure detected" from within lldb-dap process (drawback: the timing is tricky).

We have a memory monitor in DAP that should take care of that. Unfortunately the metrics are pretty different across platforms so making this configurable might be tricky. That said, it would be interesting to understand why that's not kicking in.

command line option vs. over the protocol vs. VS Code settings

IIUC, there are two aspects to it: lifetime, and where to live. The following is my thoughts but no strong opinion.

Lifetime of the option: I feel this option is better for the lifetime of the lldb-dap process, because it should be consistent across different connections. Or maybe I'm missing a scenario where per-connection or per-target value works better.

I think the above decides "where does the option live". If we decide that the option is better for the whole process, then I think we need both a command line argument and a VS Code setting. If it's going to be per connection, then we need a VS Code setting.

Fair enough. We could implement something where each connection specifies its own TTL but that feels very overengineered. Given that it's tied to the server mode, I can see the argument for a command line option.

EDIT: Did you mean that we can implement this in the lldb-dap extension instead of here in lldb-dap binary? We have use cases where we start lldb-dap binary either manually or through other scripts, so it's good to have this functionality supported into lldb-dap binary (and we can have a setting in the lldb-dap extension which passes down into this command line option).

No, but I do think this should be configurable from the extension, like how folks can pick server mode there.

TL;DR You convinced me that this warrants being a command line option.

Minor feedback:

  • At the risk of entering bike-shedding territory, would something like "connection-timeout" be a better name for this option? When I hear TTL I associate it with data that expires (e.g. in the context of DNS and networking).
  • Could you catch the situation where this option is provided but connection is not and print a warning saying that essentially says this only works in server mode?

@royitaqi
Copy link
Contributor Author

royitaqi commented Sep 4, 2025

@JDevlieghere: Thank you for your reply. I appreciate it a lot.

I will:

  1. Learn about the memory monitor in DAP.
    • Context: It's not failing - We are just starting to experiment with the server mode. This connection timeout feature is a precaution before we can do large scale rollout. Some folks liked the concept of auto-cleanup of the lldb-dap process, just to have the peace of mind that the memory issue isn't caused by the debugger. Good to know about the memory monitor in DAP. That adds more confidence.
  2. Rename the option to "connection-timeout". I like this name a lot, esp. because the other option is "--connection", so this has a nice synergy in the names.
  3. Catch that case of --connection-timeout is given while --connection isn't. Print a warning.
  4. Add extension settings (similar to "Server Mode") and pass that down to --connection-timeout when starting the server.
  5. Add tests.

--

(minor clarification:)

I do think this should be configurable from the extension, like how folks can pick server mode there.

I wonder if you have any preference about whether this should be in this same patch, or should be a separate follow-up. I have no strong opinion (shouldn't be too much work), so I will just go with whichever you prefer.

@walter-erquinigo
Copy link
Member

connection-timeout seems to be a better name. I'll wait for the current discussions to finish before reviewing again

Copy link
Contributor

@jeffreytan81 jeffreytan81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the comment I left.

@royitaqi
Copy link
Contributor Author

royitaqi commented Sep 8, 2025

@JDevlieghere, @walter-erquinigo: I have updated the patch according to your suggestions. LMK if it looks alright.

Done changes:

  1. Renamed the command line option to --connection-timeout.
  2. Printed a warning and exit lldb-dap when --connection-timeout is given and --connection isn't.
  3. Added an extension setting Connection Timeout and pass that down to --connection-timeout when starting the server.
  4. Added api tests.

Caveats:

  • The existing tests in TestDAP_server.py don't test with concurrent clients. My newly added tests don't either.
  • Wanted to update lldb/tools/lldb-dap/README.md, but realized that the "Server Mode" isn't mentioned there. @walter-erquinigo LMK if you want me to add a section to mention both (if so, where in the doc and suggestive new section name?).

@royitaqi royitaqi changed the title [lldb-dap] Add new optional argument time-to-live when using --connection [lldb-dap] Add new optional command line argument --connection-timeout to use with --connection Sep 8, 2025
@royitaqi royitaqi changed the title [lldb-dap] Add new optional command line argument --connection-timeout to use with --connection [lldb-dap] Add command line argument --connection-timeout Sep 8, 2025
@royitaqi royitaqi changed the title [lldb-dap] Add command line argument --connection-timeout [lldb-dap] Add command line option --connection-timeout Sep 8, 2025
@walter-erquinigo
Copy link
Member

@royitaqi , please add a section in the README about the server mode

Copy link
Member

@walter-erquinigo walter-erquinigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the documentation and i'll approve this :)
I also requested a minor change

std::tie(protocol, name) = *maybeProtoclAndName;
if (auto Err = serveConnection(protocol, name, log.get(), default_repl_mode,
pre_init_commands)) {
if (auto Err =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use auto

Copy link
Contributor Author

@royitaqi royitaqi Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the coding convention says that auto can/should be used when it's obvious from the context what the actual type is. IMHO the type here is obvious (llvm::Error). With that said, I can spell it out in my next update.

EDIT: Actually, the file contains multiple auto Err, and so I think it's better to leave them as auto to be consistent. I can do a second PR to change all of them if you have a strong opinion about it.

FWIW, that auto was from existing code. For that line, I just added a parameter and it caused a formatting change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh okay, that's fine.
Well, for lldb-dap llvm::Error is kind of obvious, but not for all llvm code, where llvm::ErrorOr can be used as well.
Anyway, all good with this change

@royitaqi
Copy link
Contributor Author

royitaqi commented Sep 9, 2025

@walter-erquinigo: Added documentation. Also fixed an error in existing documentation about how to open the Settings page. LMK if it looks alright (if not, feel free to give me the specific change that you want me to put in).

Also LMK if you are okay with skipping the auto change. See my comment.

Copy link
Member

@walter-erquinigo walter-erquinigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome documentation!

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@royitaqi royitaqi merged commit acb38c8 into llvm:main Sep 10, 2025
9 checks passed
@royitaqi
Copy link
Contributor Author

royitaqi commented Sep 10, 2025

Latest update: This has been fixed in #157924 and the buildbot is green now.

-- (below are older comments) --

I'm seeing a buildbot failure. https://lab.llvm.org/buildbot/#/changes/52237

Rerunning it while double checking locally.

EDIT:
Locally tested on macOS and Linux (CentOS). Both passed.

Here is the error log:
https://gist.github.com/royitaqi/f24f7a9f85d699681627d0157df706e2

  File "/home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/tools/lldb-dap/server/TestDAP_server.py", line 30, in cleanup
    process.wait(wait_seconds_for_termination)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/subprocess.py", line 1280, in wait
    return self._wait(timeout=timeout)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/subprocess.py", line 2058, in _wait
    raise TimeoutExpired(self.args, timeout)

I'm guessing the timeout in the new tests are too small and doesn't allow a complete termination of the server on Debian. I will fix forward (i.e. create a new patch to make the timeout larger).

royitaqi added a commit that referenced this pull request Sep 10, 2025
See
#156803 (comment)
for full context.

Summary:
* In #156803, we see a builtbot
failure for `TestDAP_server.py` on Debian. Note that macOS and other
Linux distributions (like CentOS, or whatever Linux the [required
buildbot](https://github.com/llvm/llvm-project/actions/runs/17594257911/job/49982560193#logs)
uses) seem to pass the tests.
* In the 3 newly added tests, the most complex test case passed, while
the other easier ones failed.
```
PASS: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_multiple_sessions (TestDAP_server.TestDAP_server.test_connection_timeout_multiple_sessions)
FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_long_debug_session (TestDAP_server.TestDAP_server.test_connection_timeout_long_debug_session)
FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_at_server_start (TestDAP_server.TestDAP_server.test_connection_timeout_at_server_start)
```
* The error is that `process.wait(timeout)` timed out during the
teardown of the tests.
* The above suggests that maybe the root cause is that the timeout is
set too strictly (and that maybe the server termination on Debian is
slower than the other Linux distribution for some reason).
* This patch loosens that timeout from 2s to 5s. Let's see if this
works.
* FWIW, I cannot verify the root cause, because I don't have a Debian
machine.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 10, 2025
See
llvm/llvm-project#156803 (comment)
for full context.

Summary:
* In llvm/llvm-project#156803, we see a builtbot
failure for `TestDAP_server.py` on Debian. Note that macOS and other
Linux distributions (like CentOS, or whatever Linux the [required
buildbot](https://github.com/llvm/llvm-project/actions/runs/17594257911/job/49982560193#logs)
uses) seem to pass the tests.
* In the 3 newly added tests, the most complex test case passed, while
the other easier ones failed.
```
PASS: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_multiple_sessions (TestDAP_server.TestDAP_server.test_connection_timeout_multiple_sessions)
FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_long_debug_session (TestDAP_server.TestDAP_server.test_connection_timeout_long_debug_session)
FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_at_server_start (TestDAP_server.TestDAP_server.test_connection_timeout_at_server_start)
```
* The error is that `process.wait(timeout)` timed out during the
teardown of the tests.
* The above suggests that maybe the root cause is that the timeout is
set too strictly (and that maybe the server termination on Debian is
slower than the other Linux distribution for some reason).
* This patch loosens that timeout from 2s to 5s. Let's see if this
works.
* FWIW, I cannot verify the root cause, because I don't have a Debian
machine.
JDevlieghere pushed a commit to swiftlang/llvm-project that referenced this pull request Oct 14, 2025
# Usage

This is an optional new command line option to use with `--connection`.

```
--connection-timeout <timeout>    When using --connection, the number of seconds to
        wait for new connections after the server has started and after all clients
        have disconnected. Each new connection will reset the timeout. When the
        timeout is reached, the server will be closed and the process will exit.
        Not specifying this argument or specifying non-positive values will cause
        the server to wait for new connections indefinitely.
```

A corresponding extension setting `Connection Timeout` is added to the
`lldb-dap` VS Code extension.

# Benefits

Automatic release of resources when lldb-dap is no longer being used
(e.g. release memory used by module cache).

# Test

See PR.

(cherry picked from commit acb38c8)
JDevlieghere pushed a commit to swiftlang/llvm-project that referenced this pull request Oct 14, 2025
See
llvm#156803 (comment)
for full context.

Summary:
* In llvm#156803, we see a builtbot
failure for `TestDAP_server.py` on Debian. Note that macOS and other
Linux distributions (like CentOS, or whatever Linux the [required
buildbot](https://github.com/llvm/llvm-project/actions/runs/17594257911/job/49982560193#logs)
uses) seem to pass the tests.
* In the 3 newly added tests, the most complex test case passed, while
the other easier ones failed.
```
PASS: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_multiple_sessions (TestDAP_server.TestDAP_server.test_connection_timeout_multiple_sessions)
FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_long_debug_session (TestDAP_server.TestDAP_server.test_connection_timeout_long_debug_session)
FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_at_server_start (TestDAP_server.TestDAP_server.test_connection_timeout_at_server_start)
```
* The error is that `process.wait(timeout)` timed out during the
teardown of the tests.
* The above suggests that maybe the root cause is that the timeout is
set too strictly (and that maybe the server termination on Debian is
slower than the other Linux distribution for some reason).
* This patch loosens that timeout from 2s to 5s. Let's see if this
works.
* FWIW, I cannot verify the root cause, because I don't have a Debian
machine.

(cherry picked from commit 11a4f5b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants