- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[HIP] Support compressing device binary #67162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-clang-driver ChangesAdd option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled. Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled. When enabled, zstd or zlib is used for compression when available. When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before. Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary. Patch is 36.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67162.diff 8 Files Affected: 
 diff --git a/clang/docs/ClangOffloadBundler.rst b/clang/docs/ClangOffloadBundler.rst
index d08bf4b97781fa4..1e21d3e7264d5c3 100644
--- a/clang/docs/ClangOffloadBundler.rst
+++ b/clang/docs/ClangOffloadBundler.rst
@@ -309,3 +309,30 @@ target by comparing bundle ID's. Two bundle ID's are considered compatible if:
   * Their offload kind are the same
   * Their target triple are the same
   * Their GPUArch are the same
+
+Compression and Decompression
+=============================
+
+``clang-offload-bundler`` provides features to compress and decompress the full
+bundle, leveraging inherent redundancies within the bundle entries. Use the
+`-compress` command-line option to enable this compression capability.
+
+The compressed offload bundle begins with a header followed by the compressed binary data:
+
+- **Magic Number (4 bytes)**:
+    This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle).
+
+- **Version Number (16-bit unsigned int)**:
+    This denotes the version of the compressed offload bundle format. The current version is `1`.
+
+- **Compression Method (16-bit unsigned int)**:
+    This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration.
+
+- **Uncompressed Binary Size (32-bit unsigned int)**:
+    This is the size (in bytes) of the binary data before it was compressed.
+
+- **Hash (64-bit unsigned int)**:
+    This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes.
+
+- **Compressed Data**:
+    The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size.
diff --git a/clang/include/clang/Driver/OffloadBundler.h b/clang/include/clang/Driver/OffloadBundler.h
index 28473c53662de2c..17df31d31071d99 100644
--- a/clang/include/clang/Driver/OffloadBundler.h
+++ b/clang/include/clang/Driver/OffloadBundler.h
@@ -19,6 +19,7 @@
 
 #include "llvm/Support/Error.h"
 #include "llvm/TargetParser/Triple.h"
+#include <llvm/Support/MemoryBuffer.h>
 #include <string>
 #include <vector>
 
@@ -26,11 +27,15 @@ namespace clang {
 
 class OffloadBundlerConfig {
 public:
+  OffloadBundlerConfig();
+
   bool AllowNoHost = false;
   bool AllowMissingBundles = false;
   bool CheckInputArchive = false;
   bool PrintExternalCommands = false;
   bool HipOpenmpCompatible = false;
+  bool Compress = false;
+  bool Verbose = false;
 
   unsigned BundleAlignment = 1;
   unsigned HostInputIndex = ~0u;
@@ -84,6 +89,38 @@ struct OffloadTargetInfo {
   std::string str() const;
 };
 
+// CompressedOffloadBundle represents the format for the compressed offload
+// bundles.
+//
+// The format is as follows:
+// - Magic Number (4 bytes) - A constant "CCOB".
+// - Version (2 bytes)
+// - Compression Method (2 bytes) - Uses the values from
+// llvm::compression::Format.
+// - Uncompressed Size (4 bytes).
+// - Truncated MD5 Hash (8 bytes).
+// - Compressed Data (variable length).
+
+class CompressedOffloadBundle {
+private:
+  static inline const size_t MagicSize = 4;
+  static inline const size_t VersionFieldSize = sizeof(uint16_t);
+  static inline const size_t MethodFieldSize = sizeof(uint16_t);
+  static inline const size_t SizeFieldSize = sizeof(uint32_t);
+  static inline const size_t HashFieldSize = 8;
+  static inline const size_t HeaderSize = MagicSize + VersionFieldSize +
+                                          MethodFieldSize + SizeFieldSize +
+                                          HashFieldSize;
+  static inline const llvm::StringRef MagicNumber = "CCOB";
+  static inline const uint16_t Version = 1;
+
+public:
+  static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+  compress(const llvm::MemoryBuffer &Input, bool Verbose = false);
+  static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+  decompress(const llvm::MemoryBuffer &Input, bool Verbose = false);
+};
+
 } // namespace clang
 
 #endif // LLVM_CLANG_DRIVER_OFFLOADBUNDLER_H
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 0f93479170d73bc..b8295fa47327157 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1183,6 +1183,10 @@ def fgpu_inline_threshold_EQ : Joined<["-"], "fgpu-inline-threshold=">,
 def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
   HelpText<"Enable sanitizer for supported offloading devices">;
 def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;
+
+def foffload_compress : Flag<["-"], "foffload-compress">,
+  HelpText<"Compress offload device binaries (HIP only)">;
+def fno_offload_compress : Flag<["-"], "fno-offload-compress">;
 }
 
 // CUDA options
diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index d11c41605bf39ee..7737c71485f5bf3 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -21,6 +21,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Object/Archive.h"
@@ -28,17 +29,20 @@
 #include "llvm/Object/Binary.h"
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/Compression.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/EndianStream.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/ErrorOr.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MD5.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/Program.h"
 #include "llvm/Support/Signals.h"
 #include "llvm/Support/StringSaver.h"
+#include "llvm/Support/Timer.h"
 #include "llvm/Support/WithColor.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/Host.h"
@@ -48,6 +52,7 @@
 #include <cstddef>
 #include <cstdint>
 #include <forward_list>
+#include <llvm/Support/Process.h>
 #include <memory>
 #include <set>
 #include <string>
@@ -58,6 +63,10 @@ using namespace llvm;
 using namespace llvm::object;
 using namespace clang;
 
+static llvm::TimerGroup
+    ClangOffloadBundlerTimerGroup("Clang Offload Bundler Timer Group",
+                                  "Timer group for clang offload bundler");
+
 /// Magic string that marks the existence of offloading data.
 #define OFFLOAD_BUNDLER_MAGIC_STR "__CLANG_OFFLOAD_BUNDLE__"
 
@@ -224,20 +233,22 @@ class FileHandler {
 
   /// Write the header of the bundled file to \a OS based on the information
   /// gathered from \a Inputs.
-  virtual Error WriteHeader(raw_fd_ostream &OS,
+  virtual Error WriteHeader(raw_ostream &OS,
                             ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) = 0;
 
   /// Write the marker that initiates a bundle for the triple \a TargetTriple to
   /// \a OS.
-  virtual Error WriteBundleStart(raw_fd_ostream &OS,
-                                 StringRef TargetTriple) = 0;
+  virtual Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) = 0;
 
   /// Write the marker that closes a bundle for the triple \a TargetTriple to \a
   /// OS.
-  virtual Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) = 0;
+  virtual Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) = 0;
 
   /// Write the bundle from \a Input into \a OS.
-  virtual Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) = 0;
+  virtual Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) = 0;
+
+  /// Finalize output file.
+  virtual Error finalizeOutputFile() { return Error::success(); }
 
   /// List bundle IDs in \a Input.
   virtual Error listBundleIDs(MemoryBuffer &Input) {
@@ -311,7 +322,7 @@ static uint64_t Read8byteIntegerFromBuffer(StringRef Buffer, size_t pos) {
 }
 
 /// Write 8-byte integers to a buffer in little-endian format.
-static void Write8byteIntegerToBuffer(raw_fd_ostream &OS, uint64_t Val) {
+static void Write8byteIntegerToBuffer(raw_ostream &OS, uint64_t Val) {
   llvm::support::endian::write(OS, Val, llvm::support::little);
 }
 
@@ -435,7 +446,7 @@ class BinaryFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
 
     // Compute size of the header.
@@ -472,19 +483,27 @@ class BinaryFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     CurWriteBundleTarget = TargetTriple.str();
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     auto BI = BundlesInfo[CurWriteBundleTarget];
-    OS.seek(BI.Offset);
+
+    // Pad with 0 to reach specified offset.
+    size_t CurrentPos = OS.tell();
+    size_t PaddingSize = BI.Offset > CurrentPos ? BI.Offset - CurrentPos : 0;
+    for (size_t I = 0; I < PaddingSize; ++I)
+      OS.write('\0');
+    assert(OS.tell() == BI.Offset);
+
     OS.write(Input.getBufferStart(), Input.getBufferSize());
+
     return Error::success();
   }
 };
@@ -607,7 +626,7 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
     assert(BundlerConfig.HostInputIndex != ~0u &&
            "Host input index not defined.");
@@ -617,12 +636,16 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     ++NumberOfProcessedInputs;
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
+    return Error::success();
+  }
+
+  Error finalizeOutputFile() final {
     assert(NumberOfProcessedInputs <= NumberOfInputs &&
            "Processing more inputs that actually exist!");
     assert(BundlerConfig.HostInputIndex != ~0u &&
@@ -640,10 +663,6 @@ class ObjectFileHandler final : public FileHandler {
     assert(BundlerConfig.ObjcopyPath != "" &&
            "llvm-objcopy path not specified");
 
-    // We write to the output file directly. So, we close it and use the name
-    // to pass down to llvm-objcopy.
-    OS.close();
-
     // Temporary files that need to be removed.
     TempFileHandlerRAII TempFiles;
 
@@ -684,7 +703,7 @@ class ObjectFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     return Error::success();
   }
 
@@ -781,22 +800,22 @@ class TextFileHandler final : public FileHandler {
     return Error::success();
   }
 
-  Error WriteHeader(raw_fd_ostream &OS,
+  Error WriteHeader(raw_ostream &OS,
                     ArrayRef<std::unique_ptr<MemoryBuffer>> Inputs) final {
     return Error::success();
   }
 
-  Error WriteBundleStart(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleStart(raw_ostream &OS, StringRef TargetTriple) final {
     OS << BundleStartString << TargetTriple << "\n";
     return Error::success();
   }
 
-  Error WriteBundleEnd(raw_fd_ostream &OS, StringRef TargetTriple) final {
+  Error WriteBundleEnd(raw_ostream &OS, StringRef TargetTriple) final {
     OS << BundleEndString << TargetTriple << "\n";
     return Error::success();
   }
 
-  Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) final {
+  Error WriteBundle(raw_ostream &OS, MemoryBuffer &Input) final {
     OS << Input.getBuffer();
     return Error::success();
   }
@@ -881,6 +900,184 @@ CreateFileHandler(MemoryBuffer &FirstInput,
                            "'" + FilesType + "': invalid file type specified");
 }
 
+OffloadBundlerConfig::OffloadBundlerConfig() {
+  auto IgnoreEnvVarOpt =
+      llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_IGNORE_ENV_VAR");
+  if (IgnoreEnvVarOpt.has_value() && IgnoreEnvVarOpt.value() == "1")
+    return;
+
+  auto VerboseEnvVarOpt = llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_VERBOSE");
+  if (VerboseEnvVarOpt.has_value())
+    Verbose = VerboseEnvVarOpt.value() == "1";
+
+  auto CompressEnvVarOpt =
+      llvm::sys::Process::GetEnv("OFFLOAD_BUNDLER_COMPRESS");
+  if (CompressEnvVarOpt.has_value())
+    Compress = CompressEnvVarOpt.value() == "1";
+}
+
+llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
+                                  bool Verbose) {
+  llvm::Timer HashTimer("Hash Calculation Timer", "Hash calculation time",
+                        ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    HashTimer.startTimer();
+  llvm::MD5 Hash;
+  llvm::MD5::MD5Result Result;
+  Hash.update(Input.getBuffer());
+  Hash.final(Result);
+  uint64_t TruncatedHash = Result.low();
+  if (Verbose)
+    HashTimer.stopTimer();
+
+  SmallVector<uint8_t, 0> CompressedBuffer;
+  auto BufferUint8 = llvm::ArrayRef<uint8_t>(
+      reinterpret_cast<const uint8_t *>(Input.getBuffer().data()),
+      Input.getBuffer().size());
+
+  llvm::compression::Format CompressionFormat;
+
+  if (llvm::compression::zstd::isAvailable())
+    CompressionFormat = llvm::compression::Format::Zstd;
+  else if (llvm::compression::zlib::isAvailable())
+    CompressionFormat = llvm::compression::Format::Zlib;
+  else
+    return createStringError(llvm::inconvertibleErrorCode(),
+                             "Compression not supported");
+
+  llvm::Timer CompressTimer("Compression Timer", "Compression time",
+                            ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    CompressTimer.startTimer();
+  llvm::compression::compress(CompressionFormat, BufferUint8, CompressedBuffer);
+  if (Verbose)
+    CompressTimer.stopTimer();
+
+  uint16_t CompressionMethod = static_cast<uint16_t>(CompressionFormat);
+  uint32_t UncompressedSize = Input.getBuffer().size();
+
+  SmallVector<char, 0> FinalBuffer;
+  FinalBuffer.append(MagicNumber.begin(), MagicNumber.end());
+  FinalBuffer.append(reinterpret_cast<const char *>(&Version),
+                     reinterpret_cast<const char *>(&Version) +
+                         sizeof(Version));
+  FinalBuffer.append(reinterpret_cast<char *>(&CompressionMethod),
+                     reinterpret_cast<char *>(&CompressionMethod) +
+                         sizeof(CompressionMethod));
+  FinalBuffer.append(reinterpret_cast<char *>(&UncompressedSize),
+                     reinterpret_cast<char *>(&UncompressedSize) +
+                         sizeof(UncompressedSize));
+  FinalBuffer.append(reinterpret_cast<char *>(&TruncatedHash),
+                     reinterpret_cast<char *>(&TruncatedHash) +
+                         sizeof(TruncatedHash));
+  FinalBuffer.append(CompressedBuffer.begin(), CompressedBuffer.end());
+
+  if (Verbose) {
+    auto MethodUsed =
+        CompressionFormat == llvm::compression::Format::Zstd ? "zstd" : "zlib";
+    llvm::errs() << "Compressed bundle format version: " << Version << "\n"
+                 << "Compression method used: " << MethodUsed << "\n"
+                 << "Binary size before compression: " << UncompressedSize
+                 << " bytes\n"
+                 << "Binary size after compression: " << CompressedBuffer.size()
+                 << " bytes\n"
+                 << "Truncated MD5 hash: "
+                 << llvm::format_hex(TruncatedHash, 16) << "\n";
+  }
+
+  return llvm::MemoryBuffer::getMemBufferCopy(
+      llvm::StringRef(FinalBuffer.data(), FinalBuffer.size()));
+}
+
+llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
+CompressedOffloadBundle::decompress(const llvm::MemoryBuffer &Input,
+                                    bool Verbose) {
+
+  StringRef Blob = Input.getBuffer();
+
+  if (Blob.size() < HeaderSize) {
+    return llvm::MemoryBuffer::getMemBufferCopy(Blob);
+  }
+  StringRef Magic = Blob.substr(0, MagicSize);
+  if (Magic != MagicNumber) {
+    if (Verbose)
+      llvm::errs() << "Uncompressed bundle.\n";
+    return llvm::MemoryBuffer::getMemBufferCopy(Blob);
+  }
+
+  uint16_t ThisVersion = *reinterpret_cast<const uint16_t *>(
+      Input.getBuffer().data() + MagicNumber.size());
+  uint16_t CompressionMethod = *reinterpret_cast<const uint16_t *>(
+      Blob.data() + MagicSize + VersionFieldSize);
+  uint32_t UncompressedSize = *reinterpret_cast<const uint32_t *>(
+      Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize);
+  uint64_t StoredHash = *reinterpret_cast<const uint64_t *>(
+      Blob.data() + MagicSize + VersionFieldSize + MethodFieldSize +
+      SizeFieldSize);
+
+  llvm::compression::Format CompressionFormat;
+  if (CompressionMethod ==
+      static_cast<uint16_t>(llvm::compression::Format::Zlib))
+    CompressionFormat = llvm::compression::Format::Zlib;
+  else if (CompressionMethod ==
+           static_cast<uint16_t>(llvm::compression::Format::Zstd))
+    CompressionFormat = llvm::compression::Format::Zstd;
+  else
+    return createStringError(inconvertibleErrorCode(),
+                             "Unknown compressing method");
+
+  llvm::Timer DecompressTimer("Decompression Timer", "Decompression time",
+                              ClangOffloadBundlerTimerGroup);
+  if (Verbose)
+    DecompressTimer.startTimer();
+
+  SmallVector<uint8_t, 0> DecompressedData;
+  StringRef CompressedData = Blob.substr(HeaderSize);
+  if (llvm::Error DecompressionError = llvm::compression::decompress(
+          CompressionFormat, llvm::arrayRefFromStringRef(CompressedData),
+          DecompressedData, UncompressedSize))
+    return createStringError(inconvertibleErrorCode(),
+                             "Could not decompress embedded file contents: " +
+                                 llvm::toString(std::move(DecompressionError)));
+
+  if (Verbose) {
+    DecompressTimer.stopTimer();
+
+    // Recalculate MD5 hash
+    llvm::Timer HashRecalcTimer("Hash Recalculation Timer",
+                                "Hash recalculation time",
+                                ClangOffloadBundlerTimerGroup);
+    HashRecalcTimer.startTimer();
+    llvm::MD5 Hash;
+    llvm::MD5::MD5Result Result;
+    Hash.update(llvm::ArrayRef<uint8_t>(DecompressedData.data(),
+                                        DecompressedData.size()));
+    Hash.final(Result);
+    uint64_t RecalculatedHash = Result.low();
+    HashRecalcTimer.stopTimer();
+    bool HashMatch = (StoredHash == RecalculatedHash);
+
+    llvm::errs() << "Compressed bundle format version: " << ThisVersion << "\n"
+                 << "Decompression method: "
+                 << (CompressionFormat == llvm::compression::Format::Zlib
+                         ? "zlib"
+                         : "zstd")
+                 << "\n"
+                 << "Size before decompression: " << CompressedData.size()
+                 << " bytes\n"
+                 << "Size after decompression: " << UncompressedSize
+                 << " bytes\n"
+                 << "Stored hash: " << llvm::format_hex(StoredHash, 16) << "\n"
+                 << "Recalculated hash: "
+                 << llvm::format_hex(RecalculatedHash, 16) << "\n"
+                 << "Hashes match: " << (HashMatch ? "Yes" : "No") << "\n";
+  }
+
+  return llvm::MemoryBuffer::getMemBufferCopy(
+      llvm::toStringRef(DecompressedData));
+}
+
 // List bundle IDs. Return true if an error was found.
 Error OffloadBundler::ListBundleIDsInFile(
     StringRef InputFileName, const OffloadBundlerConfig &BundlerConfig) {
@@ -890,28 +1087,35 @@ Error OffloadBundler::ListBundleIDsInFile(
   if (std::error_cod...
[truncated]
 | 
| ping A little background: some HIP applications face size restrictions for their packages. Compressing can significantly reduce the bundle size with acceptable overhead of uncompressing. Therefore they need this feature. | 
| Unrelated, I need to hurry up and try to make the new driver the default for CUDA and HIP upstream at some point. I've had some thoughts about how to accomplish this in the past. For the new driver,  @MaskRay do you know if ELF compression is feasible for this application? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other offload options use -- so we should probably stay consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to --offload-compress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: can we make sure --offload is not language specific.
The hip in the title makes me worry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is to introduce a generic option which works for a specific offloading language, and the help message indicates which languages are supported. It is not feasible to support a feature in all offloading languages at once by one developer since a developer is usually only familiar with one offloading language. Once a feature is introduced for one offloading language, it can be extended or adopted by other offloading languages. This is better than each offloading language introducing an individual option for the same purpose.
| 
 We need to support compression/uncompression of standalone bundled bitcode, and standalone bundled code objects that are not embedded in ELF or COFF binaries. An OS-neutral binary format is preferred. | 
| 
 I think my ideal solution would be LLVM supporting some module metadata such that a section emitted by the backend automatically uses an ELF compressed section, though this wouldn't apply to LTO objects unfortunately. That being said, it's not unreasonable to do this manually for this specified use-case. I see this introduces a new header to the bundled format, will this break any backwards compatibility? I'm guessing old binaries can check for the LLVM bitcode, ELF, or COFF magic directly instead. If that's the case I'd probably recommend adding your magic number to the  | 
| 
 Good point. The compiler and runtime are able to consume the offload bundle files created by the old compiler. If compression is not enabled, the old compiler and runtime will be able to consume it. However, old compiler and runtime will not be able to consume the compressed bundle. I have added recognizing the clang offload bundle (compressed and uncompressed) to the magic number. | 
        
          
                clang/lib/Driver/OffloadBundler.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be easier with a raw_svector_ostream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. fixed
Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled. Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled. When enabled, zstd or zlib is used for compression when available. When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before. Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the long term we need to move away from the bundler, which would suggest we probably want something similar in the "new driver". But overall the changes here are pretty self contained and the concept is straightforward.
This reverts commit a1e81d2. Revert "Fix test hip-offload-compress-zlib.hip" This reverts commit ba01ce6. Revert due to sanity fail at https://lab.llvm.org/buildbot/#/builders/5/builds/37188 https://lab.llvm.org/buildbot/#/builders/238/builds/5955 /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment 0xaaaae2d90e7c: note: pointer points here bc 00 00 00 94 dc 29 9a 89 fb ca 2b 78 9c 8b 8f 77 f6 71 f4 73 8f f7 77 73 f3 f1 77 74 89 77 0a ^ #0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25 #1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7 Will reland after fixing it.
| Hi, I noticed that with UBSan this new testcase fails with  | 
| [AMD Official Use Only - General]
I have reverted the commit. Will fix and reland. Thanks.
Sam
From: mikaelholmen ***@***.***>
Sent: Thursday, October 5, 2023 4:05 AM
To: llvm/llvm-project ***@***.***>
Cc: Liu, Yaxun (Sam) ***@***.***>; State change ***@***.***>
Subject: Re: [llvm/llvm-project] [HIP] Support compressing device binary (PR #67162)
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
Hi,
I noticed that with UBSan this new testcase fails
09:59:08 Failed Tests (1):
09:59:08   Clang :: Driver/clang-offload-bundler-zlib.c
with
09:39:53 ../../clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0x55ceca16e46c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
09:39:53 0x55ceca16e46c: note: pointer points here
09:39:53   bc 00 00 00 94 dc 29 9a  89 fb ca 2b 78 9c 8b 8f  77 f6 71 f4 73 8f f7 77  73 f3 f1 77 74 89 77 0a
09:39:53               ^
09:39:53 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../clang/lib/Driver/OffloadBundler.cpp:1012:25 in
—
Reply to this email directly, view it on GitHub<#67162 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABO4LZ4HUWWX6BLSBJBNBXTX5ZS3HAVCNFSM6AAAAAA5DLFYLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYGMZTAMJYGY>.
You are receiving this because you modified the open/close state.Message ID: ***@***.******@***.***>> | 
| For reference, I've had similar problems in the past. Usually it happens because the header needs to be aligned on a  | 
| [AMD Official Use Only - General]
Thanks. I will fix it by memcpy instead of dereferencing int* casted from char*.
Sam
From: Joseph Huber ***@***.***>
Sent: Thursday, October 5, 2023 9:23 AM
To: llvm/llvm-project ***@***.***>
Cc: Liu, Yaxun (Sam) ***@***.***>; State change ***@***.***>
Subject: Re: [llvm/llvm-project] [HIP] Support compressing device binary (PR #67162)
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
For reference, I've had similar problems in the past. Usually it happens because the header needs to be aligned on a uint64_t to be read. The ELF section has 8-byte alignment, but that can be incorrect when extracting from something like an archive. I just make a copy in that case https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/OffloadBinary.cpp#L156.
—
Reply to this email directly, view it on GitHub<#67162 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABO4LZ3TH2F7AQWO2CQH7ZTX52YBPAVCNFSM6AAAAAA5DLFYLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYHA4DSNRXGI>.
You are receiving this because you modified the open/close state.Message ID: ***@***.******@***.***>> | 
Original PR: #67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*.
Reland "[HIP] Support compressing device binary" Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*. Co-Authored-By: Martin Schwaighofer <[email protected]> (only did the backport)
Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*. Note from committer: The original patch is splitted into two parts, one only for clang, and the other only for LLVM. This is to allow easier packaging for Nix. Signed-off-by: Gavin Zhao <[email protected]>
Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*. Note from committer: The original patch is splitted into two parts, one only for clang, and the other only for LLVM. This is to allow easier packaging for Nix. Signed-off-by: Gavin Zhao <[email protected]>
Original PR: llvm#67162 The commit was reverted due to UB detected by santizer: https://lab.llvm.org/buildbot/#/builders/238/builds/5955 clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment It was fixed by using memcpy instead of dereferencing int* casted from unaligned char*.
Add option -f[no-]offload-compress to clang to enable/disable compression of device binary for HIP. By default it is disabled.
Add option -compress to clang-offload-bundler to enable compression of offload bundle. By default it is disabled.
When enabled, zstd or zlib is used for compression when available.
When disabled, it is NFC compared to previous behavior. The same offload bundle format is used as before.
Clang-offload-bundler automatically detects whether the input file to be unbundled is compressed and the compression method and decompress if necessary.