Skip to content

[lldb] refactor PlatformAndroid and make threadsafe #145382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 13, 2025

Conversation

cs01
Copy link
Contributor

@cs01 cs01 commented Jun 23, 2025

Problem

When the new setting

set target.parallel-module-load true

was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices.

The top of the stack in the crash look something like this:

#0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe)
 #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99)
 #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda)
 #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
 #4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707)
 #5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41)
 #6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e)
 #7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff)
 #8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc)

Our workaround was to set set target.parallel-module-load to false to avoid the crash.

Background

PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection.

Changes in this diff

  • This diff refactors the code to (hopefully) have clearer ownership of the connection, clearer separation of AdbClient and SyncService by making a new class for clearer separations of concerns, called AdbSyncService.
  • New unit tests are added
  • Additional logs were added (see [lldb] refactor PlatformAndroid and make threadsafe #145382 (comment) for details)

@cs01 cs01 requested a review from JDevlieghere as a code owner June 23, 2025 18:42
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Jun 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2025

@llvm/pr-subscribers-lldb

Author: Chad Smith (cs01)

Changes

AdbClient and PlatformAndroid GetSyncService is not threadsafe. This was not an issue because it was (apparently) not used in a threaded environment. However when the new setting

set target.parallel-module-load true

was added, it began being called from multiple threads to load modules simultaneously.

Setting it to True triggers crashes, False stops it from crashing.

The top of the stack in the crash looked something like this:

lldb_private::platform_android::AdbClient::SyncService::SendSyncRequest(char const*, unsigned int, void const*) at /usr/lib/liblldb.so.15.0.0:0
lldb_private::platform_android::AdbClient::SyncService::internalStat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) at /usr/lib/liblldb.so.15.0.0:0
lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) at /usr/lib/liblldb.so.15.0.0:0

Our workaround was to set it to False, but the root cause was not fixed. This PR fixes the root cause by creating a new connection for each Adb SyncService.


Full diff: https://github.com/llvm/llvm-project/pull/145382.diff

5 Files Affected:

  • (modified) lldb/source/Plugins/Platform/Android/AdbClient.cpp (+24-5)
  • (modified) lldb/source/Plugins/Platform/Android/AdbClient.h (+2)
  • (modified) lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp (+4-6)
  • (modified) lldb/source/Plugins/Platform/Android/PlatformAndroid.h (+1-3)
  • (modified) lldb/unittests/Platform/Android/AdbClientTest.cpp (+118-3)
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.cpp b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
index a179260ca15f6..4c5342f7af60b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.cpp
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
@@ -417,13 +417,30 @@ Status AdbClient::ShellToFile(const char *command, milliseconds timeout,
   return Status();
 }
 
+// Returns a sync service for file operations.
+// This operation is thread-safe - each call creates an isolated sync service
+// with its own connection to avoid race conditions.
 std::unique_ptr<AdbClient::SyncService>
 AdbClient::GetSyncService(Status &error) {
-  std::unique_ptr<SyncService> sync_service;
-  error = StartSync();
-  if (error.Success())
-    sync_service.reset(new SyncService(std::move(m_conn)));
+  std::lock_guard<std::mutex> lock(m_sync_mutex);
+
+  // Create a temporary AdbClient with its own connection for this sync service
+  // This avoids the race condition of multiple threads accessing the same
+  // connection
+  AdbClient temp_client(m_device_id);
+
+  // Connect and start sync on the temporary client
+  error = temp_client.Connect();
+  if (error.Fail())
+    return nullptr;
 
+  error = temp_client.StartSync();
+  if (error.Fail())
+    return nullptr;
+
+  // Move the connection from the temporary client to the sync service
+  std::unique_ptr<SyncService> sync_service;
+  sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
   return sync_service;
 }
 
@@ -487,7 +504,9 @@ Status AdbClient::SyncService::internalPushFile(const FileSpec &local_file,
                                                error.AsCString());
   }
   error = SendSyncRequest(
-      kDONE, llvm::sys::toTimeT(FileSystem::Instance().GetModificationTime(local_file)),
+      kDONE,
+      llvm::sys::toTimeT(
+          FileSystem::Instance().GetModificationTime(local_file)),
       nullptr);
   if (error.Fail())
     return error;
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.h b/lldb/source/Plugins/Platform/Android/AdbClient.h
index 851c09957bd4a..9ef780bc6202b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.h
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.h
@@ -14,6 +14,7 @@
 #include <functional>
 #include <list>
 #include <memory>
+#include <mutex>
 #include <string>
 #include <vector>
 
@@ -135,6 +136,7 @@ class AdbClient {
 
   std::string m_device_id;
   std::unique_ptr<Connection> m_conn;
+  mutable std::mutex m_sync_mutex;
 };
 
 } // namespace platform_android
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
index 5bc9cc133fbd3..68e4886cb1c7a 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
@@ -474,13 +474,11 @@ std::string PlatformAndroid::GetRunAs() {
   return run_as.str();
 }
 
-AdbClient::SyncService *PlatformAndroid::GetSyncService(Status &error) {
-  if (m_adb_sync_svc && m_adb_sync_svc->IsConnected())
-    return m_adb_sync_svc.get();
-
+std::unique_ptr<AdbClient::SyncService>
+PlatformAndroid::GetSyncService(Status &error) {
   AdbClientUP adb(GetAdbClient(error));
   if (error.Fail())
     return nullptr;
-  m_adb_sync_svc = adb->GetSyncService(error);
-  return (error.Success()) ? m_adb_sync_svc.get() : nullptr;
+
+  return adb->GetSyncService(error);
 }
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
index 5602edf73c1d3..3140acb573416 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
@@ -80,9 +80,7 @@ class PlatformAndroid : public platform_linux::PlatformLinux {
   std::string GetRunAs();
 
 private:
-  AdbClient::SyncService *GetSyncService(Status &error);
-
-  std::unique_ptr<AdbClient::SyncService> m_adb_sync_svc;
+  std::unique_ptr<AdbClient::SyncService> GetSyncService(Status &error);
   std::string m_device_id;
   uint32_t m_sdk_version;
 };
diff --git a/lldb/unittests/Platform/Android/AdbClientTest.cpp b/lldb/unittests/Platform/Android/AdbClientTest.cpp
index 0808b96f69fc8..c2f658b9d1bc1 100644
--- a/lldb/unittests/Platform/Android/AdbClientTest.cpp
+++ b/lldb/unittests/Platform/Android/AdbClientTest.cpp
@@ -6,9 +6,13 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "gtest/gtest.h"
 #include "Plugins/Platform/Android/AdbClient.h"
+#include "gtest/gtest.h"
+#include <atomic>
 #include <cstdlib>
+#include <future>
+#include <thread>
+#include <vector>
 
 static void set_env(const char *var, const char *value) {
 #ifdef _WIN32
@@ -31,14 +35,14 @@ class AdbClientTest : public ::testing::Test {
   void TearDown() override { set_env("ANDROID_SERIAL", ""); }
 };
 
-TEST(AdbClientTest, CreateByDeviceId) {
+TEST_F(AdbClientTest, CreateByDeviceId) {
   AdbClient adb;
   Status error = AdbClient::CreateByDeviceID("device1", adb);
   EXPECT_TRUE(error.Success());
   EXPECT_EQ("device1", adb.GetDeviceID());
 }
 
-TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
+TEST_F(AdbClientTest, CreateByDeviceId_ByEnvVar) {
   set_env("ANDROID_SERIAL", "device2");
 
   AdbClient adb;
@@ -47,5 +51,116 @@ TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
   EXPECT_EQ("device2", adb.GetDeviceID());
 }
 
+TEST_F(AdbClientTest, GetSyncServiceThreadSafe) {
+  // Test high-volume concurrent access to GetSyncService
+  // This test verifies thread safety under sustained load with many rapid calls
+  // Catches race conditions that emerge when multiple threads make repeated
+  // calls to GetSyncService on the same AdbClient instance
+
+  AdbClient shared_adb_client("test_device");
+
+  const int num_threads = 8;
+  std::vector<std::future<bool>> futures;
+  std::atomic<int> success_count{0};
+  std::atomic<int> null_count{0};
+
+  // Launch multiple threads that all call GetSyncService on the SAME AdbClient
+  for (int i = 0; i < num_threads; ++i) {
+    futures.push_back(std::async(std::launch::async, [&]() {
+      // Multiple rapid calls to trigger the race condition
+      for (int j = 0; j < 20; ++j) {
+        Status error;
+
+        auto sync_service = shared_adb_client.GetSyncService(error);
+
+        if (sync_service != nullptr) {
+          success_count++;
+        } else {
+          null_count++;
+        }
+
+        // Small delay to increase chance of hitting the race condition
+        std::this_thread::sleep_for(std::chrono::microseconds(1));
+      }
+      return true;
+    }));
+  }
+
+  // Wait for all threads to complete
+  bool all_completed = true;
+  for (auto &future : futures) {
+    bool thread_result = future.get();
+    if (!thread_result) {
+      all_completed = false;
+    }
+  }
+
+  // This should pass (though sync services may fail
+  // to connect)
+  EXPECT_TRUE(all_completed) << "Parallel GetSyncService calls should not "
+                                "crash due to race conditions. "
+                             << "Successes: " << success_count.load()
+                             << ", Nulls: " << null_count.load();
+
+  // The key test: we should complete all operations without crashing
+  int total_operations = num_threads * 20;
+  int completed_operations = success_count.load() + null_count.load();
+  EXPECT_EQ(total_operations, completed_operations)
+      << "All operations should complete without crashing";
+}
+
+TEST_F(AdbClientTest, ConnectionMoveRaceCondition) {
+  // Test simultaneous access timing to GetSyncService
+  // This test verifies thread safety when multiple threads start at exactly
+  // the same time, maximizing the chance of hitting precise timing conflicts
+  // Catches race conditions that occur with synchronized simultaneous access
+
+  AdbClient adb_client("test_device");
+
+  // Try to trigger the exact race condition by having multiple threads
+  // simultaneously call GetSyncService
+
+  std::atomic<bool> start_flag{false};
+  std::vector<std::thread> threads;
+  std::atomic<int> null_service_count{0};
+  std::atomic<int> valid_service_count{0};
+
+  const int num_threads = 10;
+
+  // Create threads that will all start simultaneously
+  for (int i = 0; i < num_threads; ++i) {
+    threads.emplace_back([&]() {
+      // Wait for all threads to be ready
+      while (!start_flag.load()) {
+        std::this_thread::yield();
+      }
+
+      Status error;
+      auto sync_service = adb_client.GetSyncService(error);
+
+      if (sync_service == nullptr) {
+        null_service_count++;
+      } else {
+        valid_service_count++;
+      }
+    });
+  }
+
+  // Start all threads simultaneously to maximize chance of race condition
+  start_flag.store(true);
+
+  // Wait for all threads to complete
+  for (auto &thread : threads) {
+    thread.join();
+  }
+
+  // The test passes if we don't crash
+  int total_results = null_service_count.load() + valid_service_count.load();
+  EXPECT_EQ(num_threads, total_results)
+      << "All threads should complete without crashing. "
+      << "Null services: " << null_service_count.load()
+      << ", Valid services: " << valid_service_count.load();
+}
+
 } // end namespace platform_android
 } // end namespace lldb_private

@cs01 cs01 marked this pull request as draft June 23, 2025 18:43
@JDevlieghere JDevlieghere requested review from labath and splhack June 23, 2025 22:44
@royitaqi royitaqi requested a review from clayborg June 24, 2025 01:20
@cs01 cs01 marked this pull request as ready for review June 24, 2025 01:42
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one way to fix the problem. Another (maybe even more obvious) would be to use a single SyncService object but synchronize all access to it. I think your approach makes sense -- it unlocks more parallelism, though that parallelism may be a mirage if the device connection is going to be the bottleneck. And all the extra connections may add some overhead. Have you looked into how the two approaches compare?

As for the patch itself, the code makes sense, but the result looks very path-dependent (meaning: you'd never write code like this -- creating a temporary AdbClient object -- if you were writing this from scratch). It's also not optimal, as you're creating a temporary AdbClient in PlatformAndroid::GetSyncService and then creating another temporary object inside AdbClient::GetSyncService.

I think this would look better if PlatformAndroid::GetSyncService constructed a (new) SyncService directly. And the SyncService could create a temporary AdbClient object as an implementation detail (or not -- maybe it could handle all of the connection setup internally)

@dmpots dmpots requested a review from jeffreytan81 June 26, 2025 16:04
@jeffreytan81
Copy link
Contributor

@cs01, can you get a more symbolicated stacktrace for the crash with debug info and line info?
From the current callstack, it seems to be a life time use after free issue? Like the SyncService is trying to access underlying m_conn object while it is destroyed?
If so, is changing to use shared_ptr< SyncService> (with proper thread-safe protection) fixing the issue instead of creating the object for each access?

@labath
Copy link
Collaborator

labath commented Jun 27, 2025

"use after free" and "race condition" aren't mutually exclusive. The backtrace is consistent with one thread destroying (moving from) the m_conn object, while another thread is accessing it.

@cs01
Copy link
Contributor Author

cs01 commented Jun 27, 2025

Thank you both for the feedback. I started working on changes that add more mutexes to adbclient methods, use a shared pointer for the sync service, and have cleaner/less path dependent creation of the syncservice (I agree with your comments labath). I will push an update either today or sometime next week.

@cs01
Copy link
Contributor Author

cs01 commented Jun 27, 2025

After more analysis, I think shared ptr will be too difficult to implement, since the connection and client manages its lifecycle assuming it's the only client/thread using it. It will be challenging to ensure it doesn't get disconnected or freed while other threads are using it. Creating a new connection for each syncservice should have negligible overhead to establish the connection (several bytes), especially in comparison to pushing or pulling a file. I tested a build of this in a real world scenario and the performance seemed unchanged.

@cs01
Copy link
Contributor Author

cs01 commented Jul 3, 2025

It has been a week since the last update was published. Friendly ping to @labath @jeffreytan81 (or anyone else who wants to review)

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very fond of the amount of validity checks this PR is adding. What's up with that? Is it somehow related to you passing a non-nullptr-but-uninitialized Connection object (std::make_unique<ConnectionFileDescriptor>()) into the AdbClient? Any chance to get rid of that? Maybe by using a nullptr to mean "no connection"? Or by reducing the amount of moving around?

@cs01
Copy link
Contributor Author

cs01 commented Jul 10, 2025

Thanks for the feedback, I agree with you, it was confusing. I was trying to do the minimal change to make it work without too big of a refactor since this is my first PR to lldb. Based on your feedback, I did a more extensive refactor.

  • Created new file AdbClientUtils.cpp, AdbSyncService.cpp
  • Modified AdbClient and AdbSyncService to always have one unique ptr to their own connection
    • Created various functions that receive a connection as an argument
  • I added some logs to better understand and debug what's going on. Here are some results from my testing.

In lldb, turn on logs

(lldb) log enable -f /tmp/lldb.log lldb platform

then attach to an android process (e.g. process attach --pid 123) then open /tmp/lldb.log:

lldb-dap         PlatformAndroid::CreateInstance(force=true, arch={<null>,<null>})
lldb-dap         PlatformAndroid::CreateInstance() creating remote-android platform
lldb-dap         AdbClient::AdbClient() - Creating AdbClient with default constructor
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         Connected to Android device "127.0.0.1:15558"
lldb-dap         Forwarding remote TCP port 46403 to local TCP port 48341
lldb-dap         AdbClient::~AdbClient() - Destroying AdbClient for device: 127.0.0.1:15558
lldb-dap         Rewritten platform connect URL: connect://127.0.0.1:48341
lldb-dap         AdbClient::AdbClient() - Creating AdbClient with default constructor
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         AdbClient::~AdbClient() - Destroying AdbClient for device: 127.0.0.1:15558
lldb-dap         AdbClient::AdbClient(device_id='127.0.0.1:15558') - Creating AdbClient with device ID
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         Selecting device: 127.0.0.1:15558
lldb-dap         AdbClient::~AdbClient() - Destroying AdbClient for device: 127.0.0.1:15558
lldb-dap         PlatformRemoteGDBServer::GetRemoteWorkingDirectory() -> '/'
lldb-dap         AdbClient::AdbClient() - Creating AdbClient with default constructor
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         Connected to Android device "127.0.0.1:15558"
lldb-dap         Forwarding remote TCP port 34309 to local TCP port 33451
lldb-dap         AdbClient::~AdbClient() - Destroying AdbClient for device: 127.0.0.1:15558
lldb-dap         gdbserver connect URL: connect://127.0.0.1:33451

In lldb, get a file to confirm the sync service works:

platform get-file /system/build.prop /tmp/build.prop

shows

lldb-dap         AdbSyncService::AdbSyncService() - Creating AdbSyncService for device: 127.0.0.1:15558
lldb-dap         Connecting to ADB server at connect://127.0.0.1:5037
lldb-dap         Selecting device: 127.0.0.1:15558
lldb-dap         AdbSyncService::~AdbSyncService() - Destroying AdbSyncService for device: 127.0.0.1:15558

I also added an implementation for PlatformAndroid::FindProcesses. Previously it was defaulting to the Linux implementation, which made process attach by name (e.g. process attach --name com.android.settings) fail on android with attach failed: could not find a process named com.android.settings. This PR is already quite big so if you want me to move it to another one just lmk and I'm happy to do that.

@cs01 cs01 changed the title [lldb] make PlatformAndroid/AdbClient::GetSyncService threadsafe [lldb] refactor PlatformAndroid Jul 10, 2025
@cs01
Copy link
Contributor Author

cs01 commented Jul 17, 2025

Friendly ping @labath and anyone else who has any thoughts

@SiamAbdullah
Copy link
Contributor

@JDevlieghere @adrian-prantl, would you review this diff and unblock @cs01 ?

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything we can do to help move this along? I would love the people with Android expertise to comment and help move this along, or I can approve early next week if we have no further comments.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you've done, though it makes it a bit difficult to review. In the future, if you find yourself moving lot of code around, try to separate that from the functional changes into a PR of its own. That makes things flow a lot easier.

In this case, it might also have made sense to keep everything in a single file. There's not that much code, and it would avoid this business with namespaces.

In fact, I'd suggest moving everything back into a single file anyway, as I think it's the easiest way to resolve the question of naming the namespace. I see what you tried to do, but I don't think adb_client_utils is good name. I think it would make more sense to have an adb namespace which has all of the adb-related code, but that would create even more churn. If you just move things back roughly where they originally were, it should also reduce the size of the diff.

@cs01 cs01 marked this pull request as draft July 29, 2025 00:15
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling something went wrong with the latest update. (I don't think you meant to undo the last months changes to ConnectionFileDescriptor). Can you fix that? I'm fine with a force-push, if needed.

Did you also want to drop the "Draft" annotation (mark as ready for review)?

@cs01
Copy link
Contributor Author

cs01 commented Jul 29, 2025

Hi, yes sorry about that. I did not mean to push the file that state, so I marked the PR as draft while I get it in order. Generally it's ready for a review other than the ConnectionFileDescriptor changes if you want to give a look with that in mind. Btw I see there are workflows waiting approval, is that something that is done before or after the PR is approved?

@cs01
Copy link
Contributor Author

cs01 commented Jul 29, 2025

@labath I think I have everything in order now, it's ready for another pass. Thank you so much for your time and help on this!

@cs01 cs01 requested a review from labath July 29, 2025 17:04
@cs01 cs01 force-pushed the cs01/adbclient-threadsafe branch from 4756820 to 0f906e2 Compare July 29, 2025 17:15
@cs01
Copy link
Contributor Author

cs01 commented Aug 7, 2025

Friendly ping @labath

@cs01 cs01 changed the title [lldb] refactor PlatformAndroid [lldb] refactor PlatformAndroid and make threadsafe Aug 7, 2025
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a couple of details.

@cs01 cs01 requested review from JDevlieghere, labath and metacpp August 8, 2025 23:39
@cs01
Copy link
Contributor Author

cs01 commented Aug 11, 2025

@labath @JDevlieghere thanks again for all the feedback. I believe I have addressed your comments.

@cs01
Copy link
Contributor Author

cs01 commented Aug 13, 2025

@labath Is there anything else needed of me? Looks like there are workflows awaiting approval still.

@JDevlieghere JDevlieghere enabled auto-merge (squash) August 13, 2025 22:21
@JDevlieghere
Copy link
Member

@labath Is there anything else needed of me? Looks like there are workflows awaiting approval still.

I ran them and enabled auto-merge so this PR will get merged automatically if everything passes.

@JDevlieghere JDevlieghere merged commit bcb48aa into llvm:main Aug 13, 2025
9 checks passed
Copy link

@cs01 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 13, 2025
## Problem

When the new setting

```
set target.parallel-module-load true
```
was added, lldb began fetching modules from the devices from multiple
threads simultaneously. This caused crashes of lldb when debugging on
android devices.

The top of the stack in the crash look something like this:
```
#0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe)
 #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99)
 #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda)
 #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
 #4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707)
 #5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41)
 #6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e)
 #7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff)
 #8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc)
```
Our workaround was to set `set target.parallel-module-load ` to `false`
to avoid the crash.

## Background

PlatformAndroid creates two different classes with one stateful adb
connection shared between the two -- one through AdbClient and another
through AdbClient::SyncService. The connection management and state is
complex, and seems to be responsible for the segfault we are seeing. The
AdbClient code resets these connections at times, and re-establishes
connections if they are not active. Similarly, PlatformAndroid caches
its SyncService, which uses an AdbClient class, but the SyncService puts
its connection into a different 'sync' state that is incompatible with a
standard connection.

## Changes in this diff

* This diff refactors the code to (hopefully) have clearer ownership of
the connection, clearer separation of AdbClient and SyncService by
making a new class for clearer separations of concerns, called
AdbSyncService.
* New unit tests are added
* Additional logs were added (see
llvm/llvm-project#145382 (comment)
for details)
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 13, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building lldb at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/22573

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/8/12 (2299 of 2308)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/9/12 (2300 of 2308)
PASS: lldb-unit :: Utility/./UtilityTests/4/9 (2301 of 2308)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/0/2 (2302 of 2308)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/1/2 (2303 of 2308)
PASS: lldb-unit :: Host/./HostTests/4/9 (2304 of 2308)
PASS: lldb-unit :: Target/./TargetTests/11/14 (2305 of 2308)
PASS: lldb-unit :: Host/./HostTests/5/9 (2306 of 2308)
PASS: lldb-unit :: Process/gdb-remote/./ProcessGdbRemoteTests/8/9 (2307 of 2308)
UNRESOLVED: lldb-api :: tools/lldb-server/TestLldbGdbServer.py (2308 of 2308)
******************** TEST 'lldb-api :: tools/lldb-server/TestLldbGdbServer.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/tools/lldb-server -p TestLldbGdbServer.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision bcb48aa5b2cfc75967c734a97201e0c91273169d)
  clang revision bcb48aa5b2cfc75967c734a97201e0c91273169d
  llvm revision bcb48aa5b2cfc75967c734a97201e0c91273169d
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hc_then_Csignal_signals_correct_thread_launch_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hc_then_Csignal_signals_correct_thread_launch_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_fails_on_another_pid_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_fails_on_minus_one_pid_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_fails_on_zero_pid_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_switches_to_3_threads_launch_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_Hg_switches_to_3_threads_launch_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_and_p_thread_suffix_work_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_and_p_thread_suffix_work_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_writes_all_gpr_registers_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_P_writes_all_gpr_registers_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_attach_commandline_continue_app_exits_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
lldb-server exiting...
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_attach_commandline_continue_app_exits_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_c_packet_works_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
lldb-server exiting...
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_c_packet_works_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_first_launch_stop_reply_thread_matches_first_qC_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_first_launch_stop_reply_thread_matches_first_qC_llgs (TestLldbGdbServer.LldbGdbServerTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_hardware_breakpoint_set_and_remove_work_debugserver (TestLldbGdbServer.LldbGdbServerTestCase) (test case does not fall in any category of interest for this run) 
lldb-server exiting...
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_hardware_breakpoint_set_and_remove_work_llgs (TestLldbGdbServer.LldbGdbServerTestCase)

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 13, 2025

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-win running on as-builder-10 while building lldb at step 16 "test-check-lldb-unit".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/197/builds/7996

Here is the relevant piece of the build log for the reference
Step 16 (test-check-lldb-unit) failure: Test just built components: check-lldb-unit completed (failure)
...
-- Builtin supported architectures: aarch64
-- Configuring done
-- Generating done
-- Build files have been written to: C:/buildbot/as-builder-10/lldb-x-aarch64/build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins
0.517 [0/1/1]Generating C:/buildbot/as-builder-10/lldb-x-aarch64/build/compile_commands.json
45/16/246]Completed 'builtins-aarch64-unknown-linux-gnu'
0.960 [44/16/246]Performing configure step for 'runtimes-aarch64-unknown-linux-gnu'
3.433 [41/12/254]Linking CXX executable tools\lldb\unittests\ABI\AArch64\ABIAArch64Tests.exe
3.678 [39/12/255]Performing build step for 'runtimes-aarch64-unknown-linux-gnu'
3.895 [39/12/256]Building CXX object tools\lldb\unittests\Platform\Android\CMakeFiles\AdbClientTests.dir\AdbClientTest.cpp.obj
FAILED: tools/lldb/unittests/Platform/Android/CMakeFiles/AdbClientTests.dir/AdbClientTest.cpp.obj 
ccache C:\PROGRA~1\MICROS~1\2022\COMMUN~1\VC\Tools\MSVC\1444~1.352\bin\Hostx64\x64\cl.exe  /nologo /TP -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\buildbot\as-builder-10\lldb-x-aarch64\build\tools\lldb\unittests\Platform\Android -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\unittests\Platform\Android -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\build\tools\lldb\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\build\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\llvm\include -IC:\Python312\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\llvm\..\clang\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\build\tools\lldb\..\clang\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\source -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\unittests -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\source\Plugins\Platform\Android -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\third-party\unittest\googletest\include -IC:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\third-party\unittest\googlemock\include -D__OPTIMIZE__ /Zc:inline /Zc:preprocessor /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -wd4251 -wd4275 -w14062 -we4238 /Gw /O2 /Ob2  -MD   -wd4018 -wd4068 -wd4150 -wd4201 -wd4251 -wd4521 -wd4530 -wd4589  /EHs-c- /GR- -UNDEBUG -std:c++17 /showIncludes /Fotools\lldb\unittests\Platform\Android\CMakeFiles\AdbClientTests.dir\AdbClientTest.cpp.obj /Fdtools\lldb\unittests\Platform\Android\CMakeFiles\AdbClientTests.dir\ /FS -c C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\unittests\Platform\Android\AdbClientTest.cpp
C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\unittests\Platform\Android\AdbClientTest.cpp(116): error C3861: 'setenv': identifier not found
4.252 [39/11/257]Linking CXX executable tools\lldb\unittests\UnwindAssembly\x86-but-no-x86-target\UnwindAssemblyX86ButNoX86TargetTests.exe
5.133 [39/9/259]Linking CXX executable tools\lldb\unittests\DataFormatter\LLDBFormatterTests.exe
6.126 [39/8/260]Linking CXX executable tools\lldb\unittests\UnwindAssembly\ARM64\Arm64InstEmulationTests.exe
6.542 [39/7/261]Linking CXX executable tools\lldb\unittests\Thread\ThreadTests.exe
6.553 [39/6/262]Linking CXX executable tools\lldb\unittests\Core\LLDBCoreTests.exe
7.492 [39/5/263]Linking CXX executable tools\lldb\unittests\Callback\LLDBCallbackTests.exe
9.313 [39/4/264]Linking CXX executable bin\lldb-test.exe
9.912 [39/3/265]Linking CXX executable tools\lldb\unittests\Expression\ExpressionTests.exe
11.328 [39/2/266]Building CXX object tools\lldb\unittests\Target\CMakeFiles\TargetTests.dir\LocateModuleCallbackTest.cpp.obj
13.004 [39/1/267]Building CXX object tools\lldb\unittests\Platform\Android\CMakeFiles\AdbClientTests.dir\PlatformAndroidTest.cpp.obj
ninja: build stopped: subcommand failed.

@slydiman
Copy link
Contributor

@cs01

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-win

C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\unittests\Platform\Android\AdbClientTest.cpp(116): error C3861: 'setenv': identifier not found

Please take a look and fix it ASAP.
Probably it is necessary to replace setenv() with set_env().

@luporl
Copy link
Contributor

luporl commented Aug 14, 2025

This also broke lldb-aarch64-windows: https://lab.llvm.org/buildbot/#/builders/141/builds/10795

luporl added a commit that referenced this pull request Aug 14, 2025
luporl added a commit that referenced this pull request Aug 14, 2025
@luporl
Copy link
Contributor

luporl commented Aug 14, 2025

Reverted to fix the buildbots. Please let me know if any help is needed to reproduce the issue or test a fix.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.