-
Couldn't load subscription status.
- Fork 4.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
When doing transcription in Hindi for a file, I encounter invalid unicode character.
I have noticed this with many Hindi files.
Used whisper-large-v2 mode for inference on CPU. Have noticed the same issue when inferencing on GPU as well.
I am guessing the issue is: whisper model token output (BPE encoded) is not getting correctly mapped to unicode characters.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working