Skip to content

KleidiAI int4 kernels not loading properly on aarch64 Linux #2143

@vctrmn

Description

@vctrmn

Hello there !

I have been trying to follow your instructions to get KleidiAI int4 kernels working on a Scaleway ARM instance (4x16), but I'm still encountering issues.
I've done the following:

  • Built and installed KleidiAI (the library is installed at /usr/local/lib/libkleidiai.a)
  • Built torchao with the flags you mentioned: USE_CPP=1 TORCHAO_BUILD_CPU_AARCH64=1 TORCHAO_BUILD_KLEIDIAI=1 pip install .

However, when I try to run code that uses the KleidiAI kernels, I get this error:

AttributeError: '_OpNamespace' 'torchao' object has no attribute '_pack_8bit_act_4bit_weight'

Exception: TorchAO experimental kernels are not loaded. To install the kernels, run `USE_CPP=1 pip install .` from ao on a machine with an ARM CPU. You can also set target to 'aten' if you are using ARM CPU.

My CPU definitely has the required ARM features (verified with /proc/cpuinfo):

Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

Including: asimd (NEON), asimddp (Dot Product), etc.
I'm particularly interested in the optimizations mentioned in the recently merged PR #2000 , which added the new KleidiAI kernels for ARM NEON dotprod.
When I run the KleidiAI benchmark, I can see:

kai_matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod/m:64/n:64/k:64/bl:32    SKIPPED: 'GEMV optimized for m=1 only'
kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod/m:64/n:64/k:64/bl:32        6305 ns         6302 ns       111140

So it seems like the KleidiAI kernels themselves are working, but for some reason the _pack_8bit_act_4bit_weight operator isn't being registered properly in torchao.

Is there something specific I need to do to get the _pack_8bit_act_4bit_weight operator registered? Are there any diagnostic steps I can take to debug this further?

#1721 (comment)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions