Skip to content

Conversation

@njbrake
Copy link
Collaborator

@njbrake njbrake commented Oct 31, 2025

To test

# Check out this branch, run `make setup`  (this will setup up all the submodules and apply the patches)
make setup
# Git clone another copy
git clone [email protected]:mozilla-ai/llamafile.git tmp_copy
rm -rf tmp_copy/whisper.cpp
cp -r whisper.cpp tmp_copy
cd tmp_copy
git status

@njbrake njbrake requested a review from aittalam October 31, 2025 18:12
…pply-patches script perfectly lines up with the current whisper.cpp folder in the main branch
@njbrake
Copy link
Collaborator Author

njbrake commented Nov 1, 2025

Ok I think I see a few options here:

  1. We fork whisper.cpp as a separate repo, apply the patches there, and then pull it in to this project as part of a setup script. No git submodule, the dir it gets pulled into is git ignored
  2. We use the design of this PR where all the patches exist and are applied in the llamafile repo itself.
  3. Use a build system like Conan or cmake. I tend towards not wanting this rn since it'll add a little more tooling overhead but idk

Since the patches currently involve pulling in llamafile dependencies to whisper, option 2 seems like the easiest lift that would bring organizational improvements.

@njbrake njbrake marked this pull request as ready for review November 3, 2025 14:35
@njbrake njbrake linked an issue Nov 3, 2025 that may be closed by this pull request
@njbrake
Copy link
Collaborator Author

njbrake commented Nov 3, 2025

After discussion with @aittalam. we think the git submodule approach will give us a good base to split out third party logic from our codebase, so that we can have a good base to make informed upgrade decisions

@njbrake njbrake changed the title feat: integrate whisper.cpp as a submodule with patches chore: integrate whisper.cpp as a submodule Nov 3, 2025
Copy link
Member

@aittalam aittalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @njbrake! I replicated this and verified it regenerates the same code we had before, while making changes more apparent. I think it is gonna be helpful to better understand what changed, how, and how to align the changes with more recent versions of the code.

I left a couple of questions / fixes, plus one more open one here: what is your take on bringing all customised files to the same dir vs keeping their original structure? I do not have strong opinions: I originally had the intuition that subdirs might make it easier to compare with original code, but now that we have patches for that (and an explicit mapping in the apply-patches.sh script) we probably do not need that much anymore.

Co-authored-by: Davide Eynard <[email protected]>
@njbrake
Copy link
Collaborator Author

njbrake commented Nov 3, 2025

Thanks @njbrake! I replicated this and verified it regenerates the same code we had before, while making changes more apparent. I think it is gonna be helpful to better understand what changed, how, and how to align the changes with more recent versions of the code.

I left a couple of questions / fixes, plus one more open one here: what is your take on bringing all customised files to the same dir vs keeping their original structure? I do not have strong opinions: I originally had the intuition that subdirs might make it easier to compare with original code, but now that we have patches for that (and an explicit mapping in the apply-patches.sh script) we probably do not need that much anymore.

For the moment, I think I would like to keep their original structure. In future PRs we can continue clean up to revert them to their original structure, but I wanted to keep this PR as simple as possible: to keep the risk of the PR at the lowest possible, I wanted this PR to only switch to using git submodules, and in future PRs we can consider directory restructuring.

@aittalam
Copy link
Member

aittalam commented Nov 3, 2025

For the moment, I think I would like to keep their original structure. In future PRs we can continue clean up to revert them to their original structure, but I wanted to keep this PR as simple as possible: to keep the risk of the PR at the lowest possible, I wanted this PR to only switch to using git submodules, and in future PRs we can consider directory restructuring.

Ah yes, completely agree with it for now! Our target ATM is to replicate exactly what we have, and just get more signal re: the changes

@njbrake njbrake merged commit 1a5d1c2 into main Nov 3, 2025
2 checks passed
@njbrake njbrake deleted the brake/whisper_submodule branch November 3, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Whisper.cpp as submodule

3 participants