[Bugfix][TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule #15484
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
topi.arm_cpu.schedule_conv2d_NHWC_quantized_interleavedwas failing compilation with the+i8mmextension enabled (as done in #14888) whenever the output height and output width were both equal to 1, such that OH x OW = 1.Padding was being removed during the
tir.BufferShapeLegalizepass, causing an error in thetir.BufferBindUnwrapperpass. Some of the removed padding was necessary for tensorize (using thegemm_acc_2x2_int8_int8_int32intrinsic), which expects 2x2 output tiles. However, because of the optimisations mentioned above, the output tensorC_interleavedwas reduced to having 1x2 tiles instead.e.g. for A = [1x1x1x8], W = [1x1x8x24], C = [1x1x1x24]:
C_interleaved = T.Buffer((1, 1, 2, 1, 6, 1, 2), "int32”)C_interleaved = T.Buffer((1, 1, 2, 1, 6, 2, 2), "int32”)To make sure the required padding is left untouched, while the rest of it is still removed, a dummy reference to the needed axis is declared.
In the end, the leftover padding is still disregarded when computing the final output tensor
C.