Need help with running inference faster in GPU #1999
Unanswered
Vishnu280412
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Hi @Vishnu280412 👋 This class is initialized one time in a fastapi Otherwise it will be re-initialized every time. Additional you can try to run with half precision or try if the compiled models are faster (https://mindee.github.io/doctr/using_doctr/using_model_export.html#compiling-your-models-pytorch-only) self.model = ocr_predictor(
det_arch=det_model, reco_arch=reco_model, pretrained=False).to(COMPUTE_DEVICE).half() |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @felixdittrich92 👋,
So I was trying to create a fastapi endpoint of DocTR for the project and I am using a finetuned model. I have also installed cuda enabled torch and I wanted to make sure if what I am doing to run the inference is correct, because it took around 10 seconds to process 5 images.
The model loading part:
The
COMPUTE_DEVICE
in both the case is set tocuda
and I am running it on a decive with NVIDIA graphic card which has cuda.Tell me if what I am doing is correct or any changes has to be made? Or is the time taken is expected to be this long?
Beta Was this translation helpful? Give feedback.
All reactions