Skip to content

Conversation

@AscendTransport
Copy link
Contributor

@AscendTransport AscendTransport commented Nov 18, 2025

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @AscendTransport, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new HeterogeneousTcpTransport to enable TCP-based data transfers between Ascend NPUs and GPUs. This new transport mechanism works by managing dedicated host and device memory buffers and using asynchronous memory copy operations to move data between the NPU and host, before leveraging the existing TcpTransport for host-to-host communication. The changes include updates to the build system and core transfer engine logic to support this new heterogeneous transfer path.

Highlights

  • New TCP Transport for Heterogeneous Ascend/GPU: Introduced a new CMake option USE_ASCEND_HETEROGENEOUS_TCP and a corresponding HeterogeneousTcpTransport class to enable TCP-based data transfers between Ascend NPUs and GPUs.
  • Memory Management for Device-Host Transfers: The HeterogeneousTcpTransport manages dedicated host (aligned_alloc) and device (aclrtMalloc) memory buffers, utilizing asynchronous memory copy operations (aclrtMemcpyAsync) to efficiently move data between the NPU and host before TCP transmission.
  • Integration with Existing Transfer Engine: The new transport is seamlessly integrated into the MultiTransport and TransferEngine components, allowing it to be selected and used based on the build configuration.
  • Build System Updates: CMake files across mooncake-common, mooncake-transfer-engine, and mooncake-integration have been updated to support the new build option, ensuring proper compilation and linking of the new transport.
  • TcpTransport::install Visibility: The install method in TcpTransport was changed from private to public, enabling HeterogeneousTcpTransport to properly utilize its functionality for underlying TCP communication.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@AscendTransport AscendTransport changed the title heterogeneous-tcp [TransferEngine][WIP]heterogeneous-tcp Nov 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new TCP-based transport for heterogeneous communication involving Ascend NPUs. The changes are extensive, touching CMake build scripts, integration code, and the core transfer engine. While the feature addition is significant, the current implementation has several critical issues that must be addressed. These include compilation errors from incorrect include paths and non-standard function usage, potential data corruption and race conditions due to incorrect types and synchronization, and performance bottlenecks from busy-waiting. I've provided specific comments and suggestions to resolve these problems.

#include "transport/ascend_transport/heterogeneous_transport/heterogeneous_rdma_transport.h"
#endif
#ifdef USE_ASCEND_HETEROGENEOUS_TCP
#include "transport/ascend_transport/heterogeneous_transport/heterogeneous_tcp_transport.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The include path for heterogeneous_tcp_transport.h appears to be incorrect. Based on the file's location (mooncake-transfer-engine/include/transport/ascend_transport/heterogeneous_tcp_transport.h) and the project's include paths, the extra heterogeneous_transport/ directory in the path is likely a copy-paste error and will cause a compilation failure.

Suggested change
#include "transport/ascend_transport/heterogeneous_transport/heterogeneous_tcp_transport.h"
#include "transport/ascend_transport/heterogeneous_tcp_transport.h"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The include path for heterogeneous_tcp_transport.h appears to be incorrect. Based on the file's location (mooncake-transfer-engine/include/transport/ascend_transport/heterogeneous_tcp_transport.h) and the project's include paths, the extra heterogeneous_transport/ directory in the path is likely a copy-paste error and will cause a compilation failure.

fixed

Comment on lines +233 to +237
#ifdef USE_ASCEND_HETEROGENEOUS_TCP
else if (std::string(proto) == "ascend") {
transport = new HeterogeneousTcpTransport();
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This else if block is conditional on USE_ASCEND_HETEROGENEOUS_TCP, but it's syntactically attached to the preceding if-else if chain. If USE_ASCEND_HETEROGENEOUS is disabled, this else if will be dangling and cause a compilation error. The entire block for selecting an ascend transport should be restructured to handle the mutually exclusive options correctly, for example by using a single if-else if chain with #if/#elif/#endif preprocessor directives to ensure correctness regardless of which flags are enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This else if block is conditional on USE_ASCEND_HETEROGENEOUS_TCP, but it's syntactically attached to the preceding if-else if chain. If USE_ASCEND_HETEROGENEOUS is disabled, this else if will be dangling and cause a compilation error. The entire block for selecting an ascend transport should be restructured to handle the mutually exclusive options correctly, for example by using a single if-else if chain with #if/#elif/#endif preprocessor directives to ensure correctness regardless of which flags are enabled.

fixed

return ret;
}

hostAddr_ = std::aligned_alloc(64, HUGE_HOST_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

std::aligned_alloc is not part of the C++ standard; it's a C11 function. Using std::aligned_alloc will likely cause compilation errors on standard-compliant compilers. You should include <cstdlib> and use aligned_alloc. The memory is correctly freed with free() in the destructor.

Suggested change
hostAddr_ = std::aligned_alloc(64, HUGE_HOST_SIZE);
hostAddr_ = static_cast<char*>(aligned_alloc(64, HUGE_HOST_SIZE));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

std::aligned_alloc is not part of the C++ standard; it's a C11 function. Using std::aligned_alloc will likely cause compilation errors on standard-compliant compilers. You should include <cstdlib> and use aligned_alloc. The memory is correctly freed with free() in the destructor.

fixed

Comment on lines 51 to 55
if (transferQueues_.empty()) {
transfer_cond_.wait(lock);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The wait on the condition variable should be done in a loop or with a predicate to handle spurious wakeups. The current implementation might cause the thread to wake up, find the queue empty, and then crash when trying to access transferQueues_.front(). A predicate in the wait call is the standard way to solve this.

Suggested change
if (transferQueues_.empty()) {
transfer_cond_.wait(lock);
}
transfer_cond_.wait(lock, [this] { return !transferQueues_.empty() || !running_; });
if (transferQueues_.empty()) {
continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The wait on the condition variable should be done in a loop or with a predicate to handle spurious wakeups. The current implementation might cause the thread to wake up, find the queue empty, and then crash when trying to access transferQueues_.front(). A predicate in the wait call is the standard way to solve this.

fixed

<< "HeterogeneousTcpTransport: empty transfer task batch";
continue;
}
int total_length = pkg.total_length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential data truncation here. pkg.total_length is a uint64_t, but it's being assigned to an int. On a typical 64-bit system, int is 32 bits, which can lead to incorrect length if the transfer size exceeds 2GB. The total_length variable should be uint64_t to match the type in the TransferTaskTCP struct.

Suggested change
int total_length = pkg.total_length;
uint64_t total_length = pkg.total_length;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential data truncation here. pkg.total_length is a uint64_t, but it's being assigned to an int. On a typical 64-bit system, int is 32 bits, which can lead to incorrect length if the transfer size exceeds 2GB. The total_length variable should be uint64_t to match the type in the TransferTaskTCP struct.

fixed

Comment on lines 25 to 26
#define HUGE_HOST_SIZE 3ULL * 1024 * 1024 * 1024
#define HUGE_DEVICE_SIZE 8 * 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to wrap macro definitions containing expressions in parentheses to avoid potential operator precedence issues when they are used in more complex expressions. While it may not be causing a bug here, it improves code robustness.

Suggested change
#define HUGE_HOST_SIZE 3ULL * 1024 * 1024 * 1024
#define HUGE_DEVICE_SIZE 8 * 1024 * 1024
#define HUGE_HOST_SIZE (3ULL * 1024 * 1024 * 1024)
#define HUGE_DEVICE_SIZE (8 * 1024 * 1024)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to wrap macro definitions containing expressions in parentheses to avoid potential operator precedence issues when they are used in more complex expressions. While it may not be causing a bug here, it improves code robustness.

fixed

Comment on lines 63 to 64
Status getTransferStatus(BatchID batch_id,
std::vector<TransferStatus> &status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This overload of getTransferStatus is declared but not defined in the corresponding .cpp file. It also doesn't override any base class method. If it's not used, it should be removed to avoid confusion and dead code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This overload of getTransferStatus is declared but not defined in the corresponding .cpp file. It also doesn't override any base class method. If it's not used, it should be removed to avoid confusion and dead code.
fixed

s = transport_->submitTransferTask(task_list);
if (!s.ok()) {
LOG(ERROR)
<< "HeterogeneousTcpTransport: Rdma submitTransferTask error";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message mentions Rdma, which seems to be a copy-paste error from another transport implementation. This should be corrected to Tcp or HeterogeneousTcp for accurate logging.

Suggested change
<< "HeterogeneousTcpTransport: Rdma submitTransferTask error";
<< "HeterogeneousTcpTransport: Tcp submitTransferTask error";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 127 to 132
if (transport_ == nullptr) {
LOG(ERROR) << "HeterogeneousTcpTransport:transport is null";
return ret;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If transport_ is null, this function returns ret, which holds the value from aclrtGetDevice. This is likely not the intended error code. It would be better to return a specific error code, like -1, to indicate the failure due to the null transport.

Suggested change
if (transport_ == nullptr) {
LOG(ERROR) << "HeterogeneousTcpTransport:transport is null";
return ret;
}
if (transport_ == nullptr) {
LOG(ERROR) << "HeterogeneousTcpTransport:transport is null";
return -1;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


ret = transport_->install(local_server_name_, meta, topo);
if (ret) {
LOG(ERROR) << "HeterogeneousTcpTransport::RdmaTransport install error, ret: " << ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message mentions RdmaTransport, which seems to be a copy-paste error from another transport implementation. This should be corrected to TcpTransport for accurate logging.

Suggested change
LOG(ERROR) << "HeterogeneousTcpTransport::RdmaTransport install error, ret: " << ret;
LOG(ERROR) << "HeterogeneousTcpTransport::TcpTransport install error, ret: " << ret;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant