-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[lldb-dap] Add command line option --connection-timeout
#156803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lldb Author: Roy Shi (royitaqi) ChangesUsage
BenefitsAutomatic release of resources when lldb-dap is no longer being used (e.g. release memory used by module cache). TestTBD. I will find a test file to add tests. Full diff: https://github.com/llvm/llvm-project/pull/156803.diff 2 Files Affected:
diff --git a/lldb/tools/lldb-dap/Options.td b/lldb/tools/lldb-dap/Options.td
index 867753e9294a6..754b8c7d03568 100644
--- a/lldb/tools/lldb-dap/Options.td
+++ b/lldb/tools/lldb-dap/Options.td
@@ -61,3 +61,10 @@ def pre_init_command: S<"pre-init-command">,
def: Separate<["-"], "c">,
Alias<pre_init_command>,
HelpText<"Alias for --pre-init-command">;
+
+def time_to_live: S<"time-to-live">,
+ MetaVarName<"<ttl>">,
+ HelpText<"When using --connection, the number of milliseconds to wait "
+ "for new connections at the beginning and after all clients have "
+ "disconnected. Not specifying this argument or specifying "
+ "non-positive values will wait indefinitely.">;
diff --git a/lldb/tools/lldb-dap/tool/lldb-dap.cpp b/lldb/tools/lldb-dap/tool/lldb-dap.cpp
index b74085f25f4e2..8b53e4d5cda83 100644
--- a/lldb/tools/lldb-dap/tool/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/tool/lldb-dap.cpp
@@ -258,7 +258,7 @@ validateConnection(llvm::StringRef conn) {
static llvm::Error
serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
Log *log, const ReplMode default_repl_mode,
- const std::vector<std::string> &pre_init_commands) {
+ const std::vector<std::string> &pre_init_commands, int ttl) {
Status status;
static std::unique_ptr<Socket> listener = Socket::Create(protocol, status);
if (status.Fail()) {
@@ -283,6 +283,21 @@ serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
g_loop.AddPendingCallback(
[](MainLoopBase &loop) { loop.RequestTermination(); });
});
+ static MainLoopBase::TimePoint ttl_time_point;
+ static std::mutex ttl_mutex;
+ if (ttl > 0) {
+ std::scoped_lock<std::mutex> lock(ttl_mutex);
+ MainLoopBase::TimePoint future =
+ std::chrono::steady_clock::now() + std::chrono::milliseconds(ttl);
+ ttl_time_point = future;
+ g_loop.AddCallback(
+ [future](MainLoopBase &loop) {
+ if (ttl_time_point == future) {
+ loop.RequestTermination();
+ }
+ },
+ future);
+ }
std::condition_variable dap_sessions_condition;
std::mutex dap_sessions_mutex;
std::map<MainLoop *, DAP *> dap_sessions;
@@ -291,6 +306,12 @@ serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
&dap_sessions_mutex, &dap_sessions,
&clientCount](
std::unique_ptr<Socket> sock) {
+ if (ttl > 0) {
+ // Reset the keep alive timer, because we won't be killing the server
+ // while this connection is being served.
+ std::scoped_lock<std::mutex> lock(ttl_mutex);
+ ttl_time_point = MainLoopBase::TimePoint();
+ }
std::string client_name = llvm::formatv("client_{0}", clientCount++).str();
DAP_LOG(log, "({0}) client connected", client_name);
@@ -327,6 +348,23 @@ serveConnection(const Socket::SocketProtocol &protocol, const std::string &name,
std::unique_lock<std::mutex> lock(dap_sessions_mutex);
dap_sessions.erase(&loop);
std::notify_all_at_thread_exit(dap_sessions_condition, std::move(lock));
+
+ if (ttl > 0) {
+ // Start the countdown to kill the server at the end of each connection.
+ std::scoped_lock<std::mutex> lock(ttl_mutex);
+ MainLoopBase::TimePoint future =
+ std::chrono::steady_clock::now() + std::chrono::milliseconds(ttl);
+ // We don't need to take the max of `keep_alive_up_to` and `future`,
+ // because `future` must be the latest.
+ ttl_time_point = future;
+ g_loop.AddCallback(
+ [future](MainLoopBase &loop) {
+ if (ttl_time_point == future) {
+ loop.RequestTermination();
+ }
+ },
+ future);
+ }
});
client.detach();
});
@@ -509,6 +547,17 @@ int main(int argc, char *argv[]) {
}
if (!connection.empty()) {
+ int ttl = 0;
+ llvm::opt::Arg *time_to_live = input_args.getLastArg(OPT_time_to_live);
+ if (time_to_live) {
+ llvm::StringRef time_to_live_value = time_to_live->getValue();
+ if (time_to_live_value.getAsInteger(10, ttl)) {
+ llvm::errs() << "'" << time_to_live_value
+ << "' is not a valid time-to-live value\n";
+ return EXIT_FAILURE;
+ }
+ }
+
auto maybeProtoclAndName = validateConnection(connection);
if (auto Err = maybeProtoclAndName.takeError()) {
llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(),
@@ -520,7 +569,7 @@ int main(int argc, char *argv[]) {
std::string name;
std::tie(protocol, name) = *maybeProtoclAndName;
if (auto Err = serveConnection(protocol, name, log.get(), default_repl_mode,
- pre_init_commands)) {
+ pre_init_commands, ttl)) {
llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(),
"Connection failed: ");
return EXIT_FAILURE;
|
Can you talk a bit about what's motivating this change? I'm a little worried about the proliferation of new options and I'm especially wary about the ones that need to be specified on the command line instead of over the protocol. Every new option extends the matrix of things we need to support.
|
@JDevlieghere Thanks for the good questions. I appreciate all of them. Below are my thoughts for discussion.
Main motivation is memory pressure. Other ways to counter memory pressure is either to monitor from lldb-dap VS Code extension and SIGINT for lldb-dap process to stop (drawback: IIUC this will terminate existing connections), OR use "memory pressure detected" from within lldb-dap process (drawback: the timing is tricky). Side product is that the periodical restart will also clean up bad states that may happen.
IIUC, there are two aspects to it: lifetime, and where to live. The following is my thoughts but no strong opinion. Lifetime of the option: I feel this option is better for the lifetime of the lldb-dap process, because it should be consistent across different connections. Or maybe I'm missing a scenario where per-connection or per-target value works better. I think the above decides "where does the option live". If we decide that the option is better for the whole process, then I think we need both a command line argument and a VS Code setting. If it's going to be per connection, then we need a VS Code setting. EDIT: Did you mean that we can implement this in the lldb-dap extension instead of here in lldb-dap binary? We have use cases where we start lldb-dap binary either manually or through other scripts, so it's good to have this functionality supported into lldb-dap binary (and we can have a setting in the lldb-dap extension which passes down into this command line option).
Updated the patch. Now it's seconds.
Yes, definitely (see "I will find a test file to add tests." in PR description). Yeah probably set TTL to be a 1 second and wait for it to terminate correctly within 3 seconds. |
@walter-erquinigo and @ashgti: Thank you for your review. I think most of your comments have been addressed in c78a38f and 13b1358. Feel free to reopen the comments if you want me to address them in a different way. -- @walter-erquinigo: I will first finish the discussion with @JDevlieghere , and if we are aligned on adding a VS Code setting for the lldb-dap extension, I will implement that. At that time, I will also add to the lldb-dap documentation. SG? BTW, I just checked, there is no such section for VS Code settings in that doc. Which section did you want me to add into? |
We have a memory monitor in DAP that should take care of that. Unfortunately the metrics are pretty different across platforms so making this configurable might be tricky. That said, it would be interesting to understand why that's not kicking in.
Fair enough. We could implement something where each connection specifies its own TTL but that feels very overengineered. Given that it's tied to the server mode, I can see the argument for a command line option.
No, but I do think this should be configurable from the extension, like how folks can pick server mode there. TL;DR You convinced me that this warrants being a command line option. Minor feedback:
|
@JDevlieghere: Thank you for your reply. I appreciate it a lot. I will:
-- (minor clarification:)
I wonder if you have any preference about whether this should be in this same patch, or should be a separate follow-up. I have no strong opinion (shouldn't be too much work), so I will just go with whichever you prefer. |
connection-timeout seems to be a better name. I'll wait for the current discussions to finish before reviewing again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the comment I left.
@JDevlieghere, @walter-erquinigo: I have updated the patch according to your suggestions. LMK if it looks alright. Done changes:
Caveats:
|
time-to-live
when using --connection
--connection-timeout
to use with --connection
--connection-timeout
to use with --connection
--connection-timeout
--connection-timeout
--connection-timeout
@royitaqi , please add a section in the README about the server mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the documentation and i'll approve this :)
I also requested a minor change
std::tie(protocol, name) = *maybeProtoclAndName; | ||
if (auto Err = serveConnection(protocol, name, log.get(), default_repl_mode, | ||
pre_init_commands)) { | ||
if (auto Err = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the coding convention says that auto can/should be used when it's obvious from the context what the actual type is. IMHO the type here is obvious (llvm::Error
). With that said, I can spell it out in my next update.
EDIT: Actually, the file contains multiple auto Err
, and so I think it's better to leave them as auto to be consistent. I can do a second PR to change all of them if you have a strong opinion about it.
FWIW, that auto
was from existing code. For that line, I just added a parameter and it caused a formatting change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh okay, that's fine.
Well, for lldb-dap llvm::Error is kind of obvious, but not for all llvm code, where llvm::ErrorOr can be used as well.
Anyway, all good with this change
@walter-erquinigo: Added documentation. Also fixed an error in existing documentation about how to open the Settings page. LMK if it looks alright (if not, feel free to give me the specific change that you want me to put in). Also LMK if you are okay with skipping the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome documentation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Latest update: This has been fixed in #157924 and the buildbot is green now. -- (below are older comments) -- I'm seeing a buildbot failure. https://lab.llvm.org/buildbot/#/changes/52237 Rerunning it while double checking locally. EDIT: Here is the error log:
I'm guessing the timeout in the new tests are too small and doesn't allow a complete termination of the server on Debian. I will fix forward (i.e. create a new patch to make the timeout larger). |
See #156803 (comment) for full context. Summary: * In #156803, we see a builtbot failure for `TestDAP_server.py` on Debian. Note that macOS and other Linux distributions (like CentOS, or whatever Linux the [required buildbot](https://github.com/llvm/llvm-project/actions/runs/17594257911/job/49982560193#logs) uses) seem to pass the tests. * In the 3 newly added tests, the most complex test case passed, while the other easier ones failed. ``` PASS: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_multiple_sessions (TestDAP_server.TestDAP_server.test_connection_timeout_multiple_sessions) FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_long_debug_session (TestDAP_server.TestDAP_server.test_connection_timeout_long_debug_session) FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_at_server_start (TestDAP_server.TestDAP_server.test_connection_timeout_at_server_start) ``` * The error is that `process.wait(timeout)` timed out during the teardown of the tests. * The above suggests that maybe the root cause is that the timeout is set too strictly (and that maybe the server termination on Debian is slower than the other Linux distribution for some reason). * This patch loosens that timeout from 2s to 5s. Let's see if this works. * FWIW, I cannot verify the root cause, because I don't have a Debian machine.
See llvm/llvm-project#156803 (comment) for full context. Summary: * In llvm/llvm-project#156803, we see a builtbot failure for `TestDAP_server.py` on Debian. Note that macOS and other Linux distributions (like CentOS, or whatever Linux the [required buildbot](https://github.com/llvm/llvm-project/actions/runs/17594257911/job/49982560193#logs) uses) seem to pass the tests. * In the 3 newly added tests, the most complex test case passed, while the other easier ones failed. ``` PASS: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_multiple_sessions (TestDAP_server.TestDAP_server.test_connection_timeout_multiple_sessions) FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_long_debug_session (TestDAP_server.TestDAP_server.test_connection_timeout_long_debug_session) FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_at_server_start (TestDAP_server.TestDAP_server.test_connection_timeout_at_server_start) ``` * The error is that `process.wait(timeout)` timed out during the teardown of the tests. * The above suggests that maybe the root cause is that the timeout is set too strictly (and that maybe the server termination on Debian is slower than the other Linux distribution for some reason). * This patch loosens that timeout from 2s to 5s. Let's see if this works. * FWIW, I cannot verify the root cause, because I don't have a Debian machine.
# Usage This is an optional new command line option to use with `--connection`. ``` --connection-timeout <timeout> When using --connection, the number of seconds to wait for new connections after the server has started and after all clients have disconnected. Each new connection will reset the timeout. When the timeout is reached, the server will be closed and the process will exit. Not specifying this argument or specifying non-positive values will cause the server to wait for new connections indefinitely. ``` A corresponding extension setting `Connection Timeout` is added to the `lldb-dap` VS Code extension. # Benefits Automatic release of resources when lldb-dap is no longer being used (e.g. release memory used by module cache). # Test See PR. (cherry picked from commit acb38c8)
See llvm#156803 (comment) for full context. Summary: * In llvm#156803, we see a builtbot failure for `TestDAP_server.py` on Debian. Note that macOS and other Linux distributions (like CentOS, or whatever Linux the [required buildbot](https://github.com/llvm/llvm-project/actions/runs/17594257911/job/49982560193#logs) uses) seem to pass the tests. * In the 3 newly added tests, the most complex test case passed, while the other easier ones failed. ``` PASS: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_multiple_sessions (TestDAP_server.TestDAP_server.test_connection_timeout_multiple_sessions) FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_long_debug_session (TestDAP_server.TestDAP_server.test_connection_timeout_long_debug_session) FAIL: LLDB (/home/worker/2.0.1/lldb-x86_64-debian/build/bin/clang-x86_64) :: test_connection_timeout_at_server_start (TestDAP_server.TestDAP_server.test_connection_timeout_at_server_start) ``` * The error is that `process.wait(timeout)` timed out during the teardown of the tests. * The above suggests that maybe the root cause is that the timeout is set too strictly (and that maybe the server termination on Debian is slower than the other Linux distribution for some reason). * This patch loosens that timeout from 2s to 5s. Let's see if this works. * FWIW, I cannot verify the root cause, because I don't have a Debian machine. (cherry picked from commit 11a4f5b)
Usage
This is an optional new command line option to use with
--connection
.A corresponding extension setting
Connection Timeout
is added to thelldb-dap
VS Code extension.Benefits
Automatic release of resources when lldb-dap is no longer being used (e.g. release memory used by module cache).
Test
Added 3 API tests.
Before adding the tests, there were already 3 tests and they took 11.7s to run.