-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Last time I checked, the results produced by whisper.cpp for computing the Log-Mel spectrogram were not exactly identical to the OpenAI implementation:
-
whisper.cpp:
https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L2284-L2298 -
OpenAI Whisper:
https://github.com/openai/whisper/blob/main/whisper/audio.py#L92-L124
I think, the produced spectrograms by the 2 methods should be pretty close to each other because the transcription obviously works correctly. But nevertheless, it would be useful to compare the spectrograms in more details and see if we can make the C++ code match better the Python code. Eliminating any differences in the audio input would make it easier to compare the transcription results between the 2 codebases.
This should be a good exercise for anyone looking to start contributing to the project, so feel free to open a PR or discuss your findings!