Skip to content

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Sep 8, 2025

More recent Vulkan SDK versions have started complaining about types having explicit memory layout, when used with memory that doesn't inherently require explicit layouts (only push constants and buffers really do), e.g.:

error: line 1097: [VUID-StandaloneSpirv-None-10684] Invalid explicit layout decorations on type for operand '367[%_ptr_Function__struct_61]'
  %1207 = OpVariable %_ptr_Function__struct_61 Function

The solution used here is simpler and more specialized (a pass that erases the explicit layout decorations from types behind pointers in storage classes that don't support them, combined with updating affected loads/stores) than what I originally described in #266 (comment), so we can hopefully use this for now and not worry about spirt::{mem,qptr} for the next release.

@eddyb eddyb marked this pull request as draft September 8, 2025 12:30
@eddyb eddyb force-pushed the erase-explicit-layout-sometimes-maybe branch 3 times, most recently from 0fd77cb to 707bdcc Compare September 9, 2025 08:45
@eddyb eddyb force-pushed the erase-explicit-layout-sometimes-maybe branch 2 times, most recently from 53cf51d to 8d01e85 Compare September 18, 2025 12:52
@eddyb eddyb force-pushed the erase-explicit-layout-sometimes-maybe branch from 8d01e85 to 84ed68f Compare September 18, 2025 16:49
@Firestar99
Copy link
Member

I just want to note that you're upgrading CI's Vulkan SDK to 1.4.321.0, but use-compiled-tools is still on 1.4.309.0, so we could have a scenario where the old SDK fails where the new SDK passes. I know that upgrading spirv-tools isn't trivial this time, so I would let this pass as is, just something to be aware of.

@eddyb
Copy link
Member Author

eddyb commented Sep 19, 2025

TODO: before landing this, we should update spirv-tools-rs as well (I only changed the Vulkan SDK version that is used for CI, via the use-installed-tools feature).

@Firestar99 I 100% agree, that's why I still have the above quoted bit in the PR description, and it's why this PR was in draft state even before rebasing it on top of #400 (I know it'd be slower but it'd be nice to test use-compiled-tools as well, not just use-installed-tools, in CI).

EDIT: fixed the spirv-tools-rs upgrade and landed it:

@eddyb eddyb force-pushed the erase-explicit-layout-sometimes-maybe branch from 84ed68f to 5a77372 Compare September 19, 2025 06:53
@eddyb
Copy link
Member Author

eddyb commented Sep 19, 2025

As the spirv-tools-rs update PR was merged:

I was able to get 10 failures in cargo compiletest by using it and turning off the new pass:

diff --git a/Cargo.toml b/Cargo.toml
index 7cf09984100..9da910407c9 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -49,3 +49,3 @@ spirv-std-types = { path = "./crates/spirv-std/shared", version = "=0.9.0" }
 spirv-std-macros = { path = "./crates/spirv-std/macros", version = "=0.9.0" }
-spirv-tools = { version = "0.12.1", default-features = false }
+spirv-tools = { version = "0.12.1", git = "https://github.com/Rust-GPU/spirv-tools-rs", default-features = false }
 rustc_codegen_spirv = { path = "./crates/rustc_codegen_spirv", version = "=0.9.0", default-features = false }
diff --git a/crates/rustc_codegen_spirv/src/linker/mod.rs b/crates/rustc_codegen_spirv/src/linker/mod.rs
index d289c8e5fce..89c46e11f03 100644
--- a/crates/rustc_codegen_spirv/src/linker/mod.rs
+++ b/crates/rustc_codegen_spirv/src/linker/mod.rs
@@ -564,3 +564,3 @@ pub fn link(
 
-        {
+        if false {
             let timer = before_pass("spirt_passes::explicit_layout::erase_when_invalid");

Re-enabling the pass (removing if false), while keeping the newer spirv-tools-rs, passes cargo compiletest.


So we should be able to land this once we release another version of spirv-tools-rs, AIUI.

@Firestar99
Copy link
Member

Should I release spirv-tools for you?

@eddyb
Copy link
Member Author

eddyb commented Sep 24, 2025

Should I release spirv-tools for you?

Yes, that would be great, thanks!
(sorry for not being clearer - and also missing the notif for this - alternatively I could handle the release if it's easier for you, but I was worrying of subtleties around that specific project, I don't think we have a very unified flow yet across repos)

@Firestar99
Copy link
Member

(welp and I completely missed your message)

Anyway, I thought I'd give spirv-tools some love before we release:

Afterwards, a release should just be a trivial cargo release minor in the root workspace

@eddyb eddyb force-pushed the erase-explicit-layout-sometimes-maybe branch from 5a77372 to 9b425ac Compare October 7, 2025 08:49
@eddyb eddyb marked this pull request as ready for review October 7, 2025 08:49
@eddyb eddyb enabled auto-merge October 7, 2025 08:50
@eddyb
Copy link
Member Author

eddyb commented Oct 7, 2025

🎯 [Cache] Restored Vulkan SDK in path: '/home/runner/vulkan-sdk'. Cache Restore ID: 'cache-linux-x64-vulkan-sdk-1.4.309.0'.
Warning: Vulkan SDK path doesn't exist: /home/runner/vulkan-sdk/1.4.321.0/x86_64
Warning: Could not find Vulkan SDK in /home/runner/vulkan-sdk/1.4.321.0/x86_64

This is very confusing. It doesn't seem to actually ignore the cache if it's the wrong version. Not sure why only Linux seems to be affected. Will purge GHA caches for old version and see what happens.

Copy link
Member

@Firestar99 Firestar99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I can't really comment too much on the code, as I'm totally unfamiliar with spirt. I should probably start reading into the library a bit at some point, since I assume most of our current passes will eventually end up in spirt anyway.

}
}

fn in_place_transform_global_var_decl(&mut self, gv_decl: &mut GlobalVarDecl) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spirt question: I don't fully understand why some visitor functions are *_in_place and others are not and returned Transformed<T> instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a combination of 2-3 things:

  • wanting Transformed's distinction of whether a chance is actually needed, to bypass work
    (but that requires by-value returns to avoid misuse)
  • anything interned (AttrSet/Const/Type) can't do in-place mutation
    (so they use &FooDef -> Transformed<FooDef> + only re-interning when changed)
  • module-owned entities (GlobalVar/Func and intra-func regions/nodes) can do in-place mutation
    (and cloning the whole definition to return it via Transformed would be prohibitively expensive)

assert_eq!(sc_kind, wk.StorageClass);
!self.addr_space_allows_explicit_layout(AddrSpace::SpvStorageClass(sc))
}
_ => unreachable!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spirt question: I also have no idea if this unreachable is actually unreachable, or how exactly imms and type_and_const_inputs works, since they're unfortunately not documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A SPIR-V "immediate" (in the spirt::spv::Imm sense) is similar to rspirv::dr::Operand w/o the Id variants (as SPIR-V IDs always correspond to some Type/Const/GlobalVar/Func/etc. interned/owned entity in SPIR-T - in the case of types, type_and_const_inputs lets them refer to other Types and/or Consts, through SPIR-V IDs, but nothing else).

So hitting this unreachable!() would require interning a type with a spirt::TypeKind::SpvInst that:

  • couldn't naturally be produced by SPIR-V -> SPIR-T (spirt::spv::lower)
  • would fail SPIR-T -> SPIR-V (spirt::spv::lift)
  • might error/panic in other parts of SPIR-T (that e.g. decode a spv::Inst's spv::Imms according to the SPIR-V "grammar")

Copy link
Member

@Firestar99 Firestar99 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I could gleam from spirt source:

  • spirt does not use the vulkan headers to extract spec information
    • instead of running codegen from spec detail, it treats spec information as runtime values
    • only a few basic types that are needed for legalization are found in the source code
    • the rest of the operations are just forwarded as unknown entities it won't touch (like texture accesses)
    • you are still including the spec and loading it at runtime to know all possible instructions and what type they are (OpType, OpConst) for lift and lower, without actually knowing exactly what they do
  • Imm::Short(u32) is sufficient for all usual operands, as they are all just u32-sized enums
  • Exception: string literals, for which you use one Imm::LongStart with many Imm::LongCont[inued] to encode a stream of N-many bytes
    • I have not found any other case using Long

Thoughts

  • With Imm::Long* rarely used but Imm::Short ubiquitous, I wonder if an Imm::unwrap_short() could make code more readable? And maintainable, since it would allow you to replace the backing datastructure later without much code breakage.
  • With Imm::Long being so rare and I assume untouched by spirt, I wonder if a plain Vec<u8> or &'cx [u8] is more efficient and simpler? That would also make OpString only have 2 Imms, which makes it fit in Inst.imms: SmallVec<[Imm; 2]> and only be one indirection to access, as previously.
pub enum Imm {
    Short(spec::OperandKind, u32),
    Long(spec::OperandKind, Vec<u32>),
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all good ideas (I also thought of interning immediates, and a few other possible designs, but without proper benchmarking I didn't want to start fine-tuning too much), could you copy that into an issue on the spirt repo? (it already feels formulated like close to that)

(this also reminds me there were a few issues on the old repo, idk if they got auto-migrated or not, I don't remember checking)


spirt does not use the vulkan headers to extract spec information
instead of running codegen from spec detail, it treats spec information as runtime values
...
the rest of the operations are just forwarded as unknown entities it won't touch (like texture accesses)
you are still including the spec and loading it at runtime

As for this part, the reason the JSON is still dynamically loaded is that I found some really neat ways to exploit the way Khronos assigns numbers (opcode, enumerands, etc. - see spirt::spv::spec::indexed), and some of that data could be neatly loaded as a binary blob (emitted by e.g. a build script), but it's hard to do it for everything and you easily run into build-host-vs-target potential issues etc.
(I know of at least one place tackling this kind of problem in its generality - icu4x and I believe the implementation is mostly in zerovec, which is very cool but somewhat daunting)

By now the SPIR-V spec is so large (in a "breadth" sense) that generating Rust code for everything feels quite wasteful - most of it is closer to schema/protocol/interface descriptions, and I still wish they had made the format self-descriptive for the most part.

let is_explicit_layout_decoration = match attr {
Attr::SpvAnnotation(attr_spv_inst)
if (attr_spv_inst.opcode == wk.OpDecorate
&& [wk.ArrayStride, wk.MatrixStride]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, phew, I was worried this only mentioned ArrayStride, but I did include MatrixStride just in case (I guess the one instance of non-explicit-layout matrices would have to bein the context of raytracing pipelines - e.g. Ctrl+F "matrix" on https://github.khronos.org/SPIRV-Registry/extensions/KHR/SPV_KHR_ray_tracing.html - though an even simpler example is just a local variable containing a matrix).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to try out if I could somehow verify that MatrixStride is actually removed from the struct...

error: error:0:0 - Structure id 12 decorated as Block must be explicitly laid out with MatrixStride decorations.
         %_struct_12 = OpTypeStruct %mat4v3float

Well.. you can't remove what isn't there in the first place...

Our matrix support is really bad and has similar screw up like the Vec's before #380, but since matrix are barely useful I think it just isn't a priority rn. The usecases are:

  • that RT extension
  • there's a few basic matrix instructions in SPIRV we don't emit but build them from basic instructions, just search OpTypeMatrix in the spec
    • OpTranspose
    • OpMatrixTimesScalar
    • OpVectorTimesMatrix
    • OpMatrixTimesVector
    • OpMatrixTimesMatrix
    • OpOuterProduct

@Firestar99 Firestar99 disabled auto-merge October 9, 2025 12:24
Copy link
Member

@Firestar99 Firestar99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't fully verify everything due to lack of knowledge on both spirt and pointer handling. But I can write you some compiletests to verify this patch works, since we have almost none for shared memory! Feel free to cherry-pick fe55d88 on top before merging (again, no push perms).

(Locally, I reversed the order of commits to have the spirv-tools upgrade first, then my compiletests and finally your fix, so I can test properly)

@eddyb eddyb enabled auto-merge October 11, 2025 09:16
@eddyb eddyb added this pull request to the merge queue Oct 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 11, 2025
@eddyb eddyb added this pull request to the merge queue Oct 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 11, 2025
@eddyb eddyb added this pull request to the merge queue Oct 11, 2025
Merged via the queue into Rust-GPU:main with commit a26e0df Oct 11, 2025
13 checks passed
@eddyb eddyb deleted the erase-explicit-layout-sometimes-maybe branch October 11, 2025 11:24
@nazar-pc
Copy link
Contributor

CI failed on main with this PR merged into it

@eddyb
Copy link
Member Author

eddyb commented Oct 12, 2025

CI failed on main with this PR merged into it

I was worried about that, I guess I should've kept an eye on it - it's just a cache filtering issue, for some reason on Linux it doesn't ignore the wrong version cache (even though I can't find a Linux/not-Linux conditional in that Vulkan SDK action, that could explain this).

Nuking old GHA caches and retrying, if it starts working it should keep working AFAICT.

EDIT: retry passed, that should be it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spirv-tools v2025.2 makes compiletest fail Workgroup doesn't work in MoltenVk

4 participants