Skip to content

Conversation

@RiverDave
Copy link
Owner

@RiverDave RiverDave commented Jul 19, 2025

The _mm_movepi8_mask intrinsic (and similar sign-bit checking intrinsics containing 8-bit integers) was being optimized away when using -fno-signed-char. Effectively replacing a cmp expression for 0.

Since unsigned values can never be less than zero, the CIR lowering was directly generating a constant 0 (I suppose we fold a vec filled with 0's to our target, which is a scalar mask, which in turn is 0) instead of the intended comparison operation, completely eliminating the icmp instruction.

See when passing as arg -fno-signed-char (no cmp generated):

OG:

define dso_local i16 @test_mm_movepi8_mask(<2 x i64> %0) #0 {
...
  %8 = bitcast <2 x i64> %7 to <16 x i8>
  %9 = icmp slt <16 x i8> %8, zeroinitializer
  store i16 %10, ptr %3, align 2
  ...
  ret i16 %12
}

CIR:

define dso_local i16 @test_mm_movepi8_mask(<2 x i64> %0) #0 {
...
  %8 = bitcast <2 x i64> %7 to <16 x i8>
  store i16 0, ptr %3, align 2 // I believe our vector cmp is folded here?
...
  ret i16 %10
}

Since integer signedness is something we can track on CIR, this optimization makes sense; however, if we want to preserve parity with OG, I believe this patch matches that behaviour.

Added special case detection for sign-bit extraction patterns (lt comparison with cir::ZeroAttr) to force signed comparison regardless of the element type's signedness. This preserves the semantic intent of checking sign bits rather than performing mathematical unsigned comparisons.

@RiverDave RiverDave marked this pull request as ready for review July 19, 2025 19:57
@RiverDave RiverDave changed the base branch from fix-vecmp-sext to main July 30, 2025 03:13
@RiverDave RiverDave added question Further information is requested and removed question Further information is requested labels Jul 30, 2025
@RiverDave RiverDave closed this Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants