KleidiAI int4 kernels not loading properly on aarch64 Linux

Hello there ! 

I have been trying to follow your instructions to get KleidiAI int4 kernels working on a Scaleway ARM instance (4x16), but I'm still encountering issues.
I've done the following:
- Built and installed KleidiAI (the library is installed at `/usr/local/lib/libkleidiai.a`)
- Built torchao with the flags you mentioned: `USE_CPP=1 TORCHAO_BUILD_CPU_AARCH64=1 TORCHAO_BUILD_KLEIDIAI=1 pip install .`

However, when I try to run code that uses the KleidiAI kernels, I get this error:
```logs
AttributeError: '_OpNamespace' 'torchao' object has no attribute '_pack_8bit_act_4bit_weight'

Exception: TorchAO experimental kernels are not loaded. To install the kernels, run `USE_CPP=1 pip install .` from ao on a machine with an ARM CPU. You can also set target to 'aten' if you are using ARM CPU.
```

My CPU definitely has the required ARM features (verified with `/proc/cpuinfo`):
```logs
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
```

Including: asimd (NEON), asimddp (Dot Product), etc.
I'm particularly interested in the optimizations mentioned in the recently merged PR #2000 , which added the new KleidiAI kernels for ARM NEON dotprod.
When I run the KleidiAI benchmark, I can see:
```logs
kai_matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod/m:64/n:64/k:64/bl:32    SKIPPED: 'GEMV optimized for m=1 only'
kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod/m:64/n:64/k:64/bl:32        6305 ns         6302 ns       111140
```

So it seems like the KleidiAI kernels themselves are working, but for some reason the `_pack_8bit_act_4bit_weight` operator isn't being registered properly in torchao.

Is there something specific I need to do to get the _pack_8bit_act_4bit_weight operator registered? Are there any diagnostic steps I can take to debug this further?

https://github.com/pytorch/ao/pull/1721#issuecomment-2835619710

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KleidiAI int4 kernels not loading properly on aarch64 Linux #2143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

KleidiAI int4 kernels not loading properly on aarch64 Linux #2143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions