-
Notifications
You must be signed in to change notification settings - Fork 30
Refactor inference processes & add new engines (FasterWhisper, vLLM) #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
sdp/processors/inference/asr/post_processing/whisper_hallucinations.py
Outdated
Show resolved
Hide resolved
Determine if generation should be replaced with reference text based on | ||
CER and uppercase ratio. | ||
""" | ||
chars = generation.replace(' ', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this chars here? is it necessary to remove blanks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilithgrigoryan, thanks for the review!
This processor is used to select either the original text or a Qwen generation with restored punctuation.
One of the selection criteria is that if the model over-capitalizes the text (above a specified upper_case_threshold), we consider the generation poor.
To check this, we look only at non-space characters to compute the ratio of capital to lowercase letters.
sdp/processors/inference/asr/faster_whisper/faster_whisper_inference.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Description
New processors added:
FasterWhisperInference
— based on SYSTRAN/faster-whispervLLMInference
— based on vllm-project/vllmNew post-processing processors:
DetectWhisperHallucinationFeatures
CleanQwenGeneration
Misc: