Skip to content

Conversation

@danielzgtg
Copy link
Collaborator

This was while investigating #48. K=1 is not really a convolution so it reduces to a matmul in TTS.cpp without modifying ggml-cpu.c. I originally had code for this in ggml.c but inlined it to TTS.cpp to avoid extra transposes.

Unfortunately, the performance improvement was not statistically significant. It was before=1822.336528ms after=1804.631546ms with Kokoro perf_battery. Random variation reversed the trend once. The PR that improves performance nonnegligibly today on Linux is mmwillet/ggml#2 .

Nevertheless, this eliminates a bunch of transposes. There are now ~100 fewer ggml_tensors that take >1ms in cli.

@danielzgtg danielzgtg force-pushed the optimize/k1conv1d branch from 3ccef55 to d379a5f Compare May 19, 2025 04:08
Copy link
Owner

@mmwillet mmwillet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. I'm going to check the norm to confirm it doesn't change the tensor output, but otherwise ready for merge imo.

Comment on lines +1180 to +1188
n = ggml_cont(ctx, ggml_transpose(ctx, n));
n = ggml_mul_mat(ctx, model->prosody_pred->n_proj_kernel, n);
n = squeeze_3d_2d_e0(ctx, n);
n = ggml_add(ctx, n, model->prosody_pred->n_proj_bias);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing seems weird here. Should we add a linter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmwillet This file had a mix of tabs and spaces, including a mix on a single line, which I tried to replicate.

In #58, I normalize everything to 4 spaces. Do you have a .clang-format file that you prefer?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not (please forgive me I am one of those people who ignores style as long as the code runs). Dice roll for any built in style is fine by me.

@danielzgtg danielzgtg force-pushed the optimize/k1conv1d branch from d379a5f to 7cea743 Compare May 28, 2025 18:17
@danielzgtg danielzgtg force-pushed the optimize/k1conv1d branch from 7cea743 to 0bcb0a6 Compare May 29, 2025 00:33
@mmwillet mmwillet merged commit 8e50738 into mmwillet:main May 29, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants