Skip to content

Conversation

@Wunkolo
Copy link
Contributor

@Wunkolo Wunkolo commented Sep 18, 2022

Uses the single-instruction AVX512 vperm* instructions to accelerate
the INT8_TYPE and INT16_TYPE permutation opcodes.

The INT8_TYPE is accelerated using AVX512VBMI subset of AVX512.
Available since Icelake(Intel) and Zen4(AMD).

Passes the current unit tests as well as the instr__gen_vperm.s unit tests from #1348

Allows access to byte-element 2-register permutations(32-byte look up
tables) and for 64-bit multi-shifts.
Particularly adding this to accelerate the assembly of our `PERMUTE`
opcode.
Uses the single-instruction AVX512 `vperm*` instructions to accelerate
the `INT8_TYPE` and `INT16_TYPE` permutation opcodes.

The `INT8_TYPE` is accelerated using `AVX512VBMI` subset of AVX512.
Available since Icelake(Intel) and Zen4(AMD).
@gibbed gibbed merged commit 5fde7c6 into xenia-project:master Oct 21, 2022
@Wunkolo Wunkolo deleted the avx512-permute branch October 21, 2022 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants