Revisit Log-Mel spectrogram computation

Last time I checked, the results produced by `whisper.cpp` for computing the Log-Mel spectrogram were not exactly identical to the OpenAI implementation:

- `whisper.cpp`:
  https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L2284-L2298

- OpenAI Whisper:
  https://github.com/openai/whisper/blob/main/whisper/audio.py#L92-L124

I think, the produced spectrograms by the 2 methods should be pretty close to each other because the transcription obviously works correctly. But nevertheless, it would be useful to compare the spectrograms in more details and see if we can make the C++ code match better the Python code. Eliminating any differences in the audio input would make it easier to compare the transcription results between the 2 codebases.

This should be a good exercise for anyone looking to start contributing to the project, so feel free to open a PR or discuss your findings!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revisit Log-Mel spectrogram computation #568

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revisit Log-Mel spectrogram computation #568

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions