-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[lldb] refactor PlatformAndroid and make threadsafe #145382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: Chad Smith (cs01) ChangesAdbClient and PlatformAndroid GetSyncService is not threadsafe. This was not an issue because it was (apparently) not used in a threaded environment. However when the new setting
was added, it began being called from multiple threads to load modules simultaneously. Setting it to True triggers crashes, False stops it from crashing. The top of the stack in the crash looked something like this:
Our workaround was to set it to False, but the root cause was not fixed. This PR fixes the root cause by creating a new connection for each Adb SyncService. Full diff: https://github.com/llvm/llvm-project/pull/145382.diff 5 Files Affected:
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.cpp b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
index a179260ca15f6..4c5342f7af60b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.cpp
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
@@ -417,13 +417,30 @@ Status AdbClient::ShellToFile(const char *command, milliseconds timeout,
return Status();
}
+// Returns a sync service for file operations.
+// This operation is thread-safe - each call creates an isolated sync service
+// with its own connection to avoid race conditions.
std::unique_ptr<AdbClient::SyncService>
AdbClient::GetSyncService(Status &error) {
- std::unique_ptr<SyncService> sync_service;
- error = StartSync();
- if (error.Success())
- sync_service.reset(new SyncService(std::move(m_conn)));
+ std::lock_guard<std::mutex> lock(m_sync_mutex);
+
+ // Create a temporary AdbClient with its own connection for this sync service
+ // This avoids the race condition of multiple threads accessing the same
+ // connection
+ AdbClient temp_client(m_device_id);
+
+ // Connect and start sync on the temporary client
+ error = temp_client.Connect();
+ if (error.Fail())
+ return nullptr;
+ error = temp_client.StartSync();
+ if (error.Fail())
+ return nullptr;
+
+ // Move the connection from the temporary client to the sync service
+ std::unique_ptr<SyncService> sync_service;
+ sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
return sync_service;
}
@@ -487,7 +504,9 @@ Status AdbClient::SyncService::internalPushFile(const FileSpec &local_file,
error.AsCString());
}
error = SendSyncRequest(
- kDONE, llvm::sys::toTimeT(FileSystem::Instance().GetModificationTime(local_file)),
+ kDONE,
+ llvm::sys::toTimeT(
+ FileSystem::Instance().GetModificationTime(local_file)),
nullptr);
if (error.Fail())
return error;
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.h b/lldb/source/Plugins/Platform/Android/AdbClient.h
index 851c09957bd4a..9ef780bc6202b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.h
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.h
@@ -14,6 +14,7 @@
#include <functional>
#include <list>
#include <memory>
+#include <mutex>
#include <string>
#include <vector>
@@ -135,6 +136,7 @@ class AdbClient {
std::string m_device_id;
std::unique_ptr<Connection> m_conn;
+ mutable std::mutex m_sync_mutex;
};
} // namespace platform_android
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
index 5bc9cc133fbd3..68e4886cb1c7a 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
@@ -474,13 +474,11 @@ std::string PlatformAndroid::GetRunAs() {
return run_as.str();
}
-AdbClient::SyncService *PlatformAndroid::GetSyncService(Status &error) {
- if (m_adb_sync_svc && m_adb_sync_svc->IsConnected())
- return m_adb_sync_svc.get();
-
+std::unique_ptr<AdbClient::SyncService>
+PlatformAndroid::GetSyncService(Status &error) {
AdbClientUP adb(GetAdbClient(error));
if (error.Fail())
return nullptr;
- m_adb_sync_svc = adb->GetSyncService(error);
- return (error.Success()) ? m_adb_sync_svc.get() : nullptr;
+
+ return adb->GetSyncService(error);
}
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
index 5602edf73c1d3..3140acb573416 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
@@ -80,9 +80,7 @@ class PlatformAndroid : public platform_linux::PlatformLinux {
std::string GetRunAs();
private:
- AdbClient::SyncService *GetSyncService(Status &error);
-
- std::unique_ptr<AdbClient::SyncService> m_adb_sync_svc;
+ std::unique_ptr<AdbClient::SyncService> GetSyncService(Status &error);
std::string m_device_id;
uint32_t m_sdk_version;
};
diff --git a/lldb/unittests/Platform/Android/AdbClientTest.cpp b/lldb/unittests/Platform/Android/AdbClientTest.cpp
index 0808b96f69fc8..c2f658b9d1bc1 100644
--- a/lldb/unittests/Platform/Android/AdbClientTest.cpp
+++ b/lldb/unittests/Platform/Android/AdbClientTest.cpp
@@ -6,9 +6,13 @@
//
//===----------------------------------------------------------------------===//
-#include "gtest/gtest.h"
#include "Plugins/Platform/Android/AdbClient.h"
+#include "gtest/gtest.h"
+#include <atomic>
#include <cstdlib>
+#include <future>
+#include <thread>
+#include <vector>
static void set_env(const char *var, const char *value) {
#ifdef _WIN32
@@ -31,14 +35,14 @@ class AdbClientTest : public ::testing::Test {
void TearDown() override { set_env("ANDROID_SERIAL", ""); }
};
-TEST(AdbClientTest, CreateByDeviceId) {
+TEST_F(AdbClientTest, CreateByDeviceId) {
AdbClient adb;
Status error = AdbClient::CreateByDeviceID("device1", adb);
EXPECT_TRUE(error.Success());
EXPECT_EQ("device1", adb.GetDeviceID());
}
-TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
+TEST_F(AdbClientTest, CreateByDeviceId_ByEnvVar) {
set_env("ANDROID_SERIAL", "device2");
AdbClient adb;
@@ -47,5 +51,116 @@ TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
EXPECT_EQ("device2", adb.GetDeviceID());
}
+TEST_F(AdbClientTest, GetSyncServiceThreadSafe) {
+ // Test high-volume concurrent access to GetSyncService
+ // This test verifies thread safety under sustained load with many rapid calls
+ // Catches race conditions that emerge when multiple threads make repeated
+ // calls to GetSyncService on the same AdbClient instance
+
+ AdbClient shared_adb_client("test_device");
+
+ const int num_threads = 8;
+ std::vector<std::future<bool>> futures;
+ std::atomic<int> success_count{0};
+ std::atomic<int> null_count{0};
+
+ // Launch multiple threads that all call GetSyncService on the SAME AdbClient
+ for (int i = 0; i < num_threads; ++i) {
+ futures.push_back(std::async(std::launch::async, [&]() {
+ // Multiple rapid calls to trigger the race condition
+ for (int j = 0; j < 20; ++j) {
+ Status error;
+
+ auto sync_service = shared_adb_client.GetSyncService(error);
+
+ if (sync_service != nullptr) {
+ success_count++;
+ } else {
+ null_count++;
+ }
+
+ // Small delay to increase chance of hitting the race condition
+ std::this_thread::sleep_for(std::chrono::microseconds(1));
+ }
+ return true;
+ }));
+ }
+
+ // Wait for all threads to complete
+ bool all_completed = true;
+ for (auto &future : futures) {
+ bool thread_result = future.get();
+ if (!thread_result) {
+ all_completed = false;
+ }
+ }
+
+ // This should pass (though sync services may fail
+ // to connect)
+ EXPECT_TRUE(all_completed) << "Parallel GetSyncService calls should not "
+ "crash due to race conditions. "
+ << "Successes: " << success_count.load()
+ << ", Nulls: " << null_count.load();
+
+ // The key test: we should complete all operations without crashing
+ int total_operations = num_threads * 20;
+ int completed_operations = success_count.load() + null_count.load();
+ EXPECT_EQ(total_operations, completed_operations)
+ << "All operations should complete without crashing";
+}
+
+TEST_F(AdbClientTest, ConnectionMoveRaceCondition) {
+ // Test simultaneous access timing to GetSyncService
+ // This test verifies thread safety when multiple threads start at exactly
+ // the same time, maximizing the chance of hitting precise timing conflicts
+ // Catches race conditions that occur with synchronized simultaneous access
+
+ AdbClient adb_client("test_device");
+
+ // Try to trigger the exact race condition by having multiple threads
+ // simultaneously call GetSyncService
+
+ std::atomic<bool> start_flag{false};
+ std::vector<std::thread> threads;
+ std::atomic<int> null_service_count{0};
+ std::atomic<int> valid_service_count{0};
+
+ const int num_threads = 10;
+
+ // Create threads that will all start simultaneously
+ for (int i = 0; i < num_threads; ++i) {
+ threads.emplace_back([&]() {
+ // Wait for all threads to be ready
+ while (!start_flag.load()) {
+ std::this_thread::yield();
+ }
+
+ Status error;
+ auto sync_service = adb_client.GetSyncService(error);
+
+ if (sync_service == nullptr) {
+ null_service_count++;
+ } else {
+ valid_service_count++;
+ }
+ });
+ }
+
+ // Start all threads simultaneously to maximize chance of race condition
+ start_flag.store(true);
+
+ // Wait for all threads to complete
+ for (auto &thread : threads) {
+ thread.join();
+ }
+
+ // The test passes if we don't crash
+ int total_results = null_service_count.load() + valid_service_count.load();
+ EXPECT_EQ(num_threads, total_results)
+ << "All threads should complete without crashing. "
+ << "Null services: " << null_service_count.load()
+ << ", Valid services: " << valid_service_count.load();
+}
+
} // end namespace platform_android
} // end namespace lldb_private
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one way to fix the problem. Another (maybe even more obvious) would be to use a single SyncService object but synchronize all access to it. I think your approach makes sense -- it unlocks more parallelism, though that parallelism may be a mirage if the device connection is going to be the bottleneck. And all the extra connections may add some overhead. Have you looked into how the two approaches compare?
As for the patch itself, the code makes sense, but the result looks very path-dependent (meaning: you'd never write code like this -- creating a temporary AdbClient object -- if you were writing this from scratch). It's also not optimal, as you're creating a temporary AdbClient in PlatformAndroid::GetSyncService and then creating another temporary object inside AdbClient::GetSyncService.
I think this would look better if PlatformAndroid::GetSyncService constructed a (new) SyncService directly. And the SyncService could create a temporary AdbClient object as an implementation detail (or not -- maybe it could handle all of the connection setup internally)
@cs01, can you get a more symbolicated stacktrace for the crash with debug info and line info? |
"use after free" and "race condition" aren't mutually exclusive. The backtrace is consistent with one thread destroying (moving from) the |
Thank you both for the feedback. I started working on changes that add more mutexes to adbclient methods, use a shared pointer for the sync service, and have cleaner/less path dependent creation of the syncservice (I agree with your comments labath). I will push an update either today or sometime next week. |
After more analysis, I think shared ptr will be too difficult to implement, since the connection and client manages its lifecycle assuming it's the only client/thread using it. It will be challenging to ensure it doesn't get disconnected or freed while other threads are using it. Creating a new connection for each syncservice should have negligible overhead to establish the connection (several bytes), especially in comparison to pushing or pulling a file. I tested a build of this in a real world scenario and the performance seemed unchanged. |
It has been a week since the last update was published. Friendly ping to @labath @jeffreytan81 (or anyone else who wants to review) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very fond of the amount of validity checks this PR is adding. What's up with that? Is it somehow related to you passing a non-nullptr-but-uninitialized Connection object (std::make_unique<ConnectionFileDescriptor>()
) into the AdbClient? Any chance to get rid of that? Maybe by using a nullptr to mean "no connection"? Or by reducing the amount of moving around?
Thanks for the feedback, I agree with you, it was confusing. I was trying to do the minimal change to make it work without too big of a refactor since this is my first PR to lldb. Based on your feedback, I did a more extensive refactor.
In lldb, turn on logs
then attach to an android process (e.g.
In lldb, get a file to confirm the sync service works:
shows
I also added an implementation for |
Friendly ping @labath and anyone else who has any thoughts |
@JDevlieghere @adrian-prantl, would you review this diff and unblock @cs01 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything we can do to help move this along? I would love the people with Android expertise to comment and help move this along, or I can approve early next week if we have no further comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you've done, though it makes it a bit difficult to review. In the future, if you find yourself moving lot of code around, try to separate that from the functional changes into a PR of its own. That makes things flow a lot easier.
In this case, it might also have made sense to keep everything in a single file. There's not that much code, and it would avoid this business with namespaces.
In fact, I'd suggest moving everything back into a single file anyway, as I think it's the easiest way to resolve the question of naming the namespace. I see what you tried to do, but I don't think adb_client_utils is good name. I think it would make more sense to have an adb
namespace which has all of the adb-related code, but that would create even more churn. If you just move things back roughly where they originally were, it should also reduce the size of the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling something went wrong with the latest update. (I don't think you meant to undo the last months changes to ConnectionFileDescriptor). Can you fix that? I'm fine with a force-push, if needed.
Did you also want to drop the "Draft" annotation (mark as ready for review)?
Hi, yes sorry about that. I did not mean to push the file that state, so I marked the PR as draft while I get it in order. Generally it's ready for a review other than the ConnectionFileDescriptor changes if you want to give a look with that in mind. Btw I see there are workflows waiting approval, is that something that is done before or after the PR is approved? |
@labath I think I have everything in order now, it's ready for another pass. Thank you so much for your time and help on this! |
4756820
to
0f906e2
Compare
Friendly ping @labath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a couple of details.
@labath @JDevlieghere thanks again for all the feedback. I believe I have addressed your comments. |
@labath Is there anything else needed of me? Looks like there are workflows awaiting approval still. |
I ran them and enabled auto-merge so this PR will get merged automatically if everything passes. |
@cs01 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
## Problem When the new setting ``` set target.parallel-module-load true ``` was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices. The top of the stack in the crash look something like this: ``` #0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe) #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99) #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda) #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0 #4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707) #5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41) #6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e) #7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff) #8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc) ``` Our workaround was to set `set target.parallel-module-load ` to `false` to avoid the crash. ## Background PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection. ## Changes in this diff * This diff refactors the code to (hopefully) have clearer ownership of the connection, clearer separation of AdbClient and SyncService by making a new class for clearer separations of concerns, called AdbSyncService. * New unit tests are added * Additional logs were added (see llvm/llvm-project#145382 (comment) for details)
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/22573 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/197/builds/7996 Here is the relevant piece of the build log for the reference
|
Please take a look and fix it ASAP. |
This also broke lldb-aarch64-windows: https://lab.llvm.org/buildbot/#/builders/141/builds/10795 |
Reverts #145382 This broke a couple of buildbots.
Reverted to fix the buildbots. Please let me know if any help is needed to reproduce the issue or test a fix. |
…e" (#153626) Reverts llvm/llvm-project#145382 This broke a couple of buildbots.
Problem
When the new setting
was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices.
The top of the stack in the crash look something like this:
Our workaround was to set
set target.parallel-module-load
tofalse
to avoid the crash.Background
PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection.
Changes in this diff