Need help with running inference faster in GPU #1999

Vishnu280412 · 2025-08-06T20:22:49Z

Vishnu280412
Aug 6, 2025

So I was trying to create a fastapi endpoint of DocTR for the project and I am using a finetuned model. I have also installed cuda enabled torch and I wanted to make sure if what I am doing to run the inference is correct, because it took around 10 seconds to process 5 images.

The model loading part:

Initially I was doing something like this:

class DocTR:
    def __init__(self):
        custom_vocab = CUSTOM_VOCAB

        # Detection model initialisation
        det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
        det_state_dict = torch.load(
            DOCTR_DETECT_MODEL_PATH, map_location=COMPUTE_DEVICE, weights_only=True)
        det_model.load_state_dict(det_state_dict)
        det_model = det_model.to(COMPUTE_DEVICE)

        # Recognition model initialisation
        reco_model = crnn_vgg16_bn(
            pretrained=False, pretrained_backbone=False, vocab=custom_vocab)
        recog_state_dict = torch.load(
            DOCTR_RECOG_MODEL_PATH, map_location=COMPUTE_DEVICE, weights_only=True)
        reco_model.load_state_dict(recog_state_dict)
        reco_model = reco_model.to(COMPUTE_DEVICE)

        self.model = ocr_predictor(
            det_arch=det_model, reco_arch=reco_model, pretrained=False).to(COMPUTE_DEVICE)

After that I got around documentations and discussions and somewhere found another way for model initialization and did this:

class DocTR:
    def __init__(self):
        custom_vocab = CUSTOM_VOCAB

        # Detection model initialisation
        det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
        det_model.from_pretrained(
            DOCTR_DETECT_MODEL_PATH, map_location=COMPUTE_DEVICE, weights_only=True)

        # Recognition model initialisation
        reco_model = crnn_vgg16_bn(
            pretrained=False, pretrained_backbone=False, vocab=custom_vocab)
        reco_model.from_pretrained(
            DOCTR_RECOG_MODEL_PATH, map_location=COMPUTE_DEVICE, weights_only=True)

        self.model = ocr_predictor(
            det_arch=det_model, reco_arch=reco_model, pretrained=False).to(COMPUTE_DEVICE)

The COMPUTE_DEVICE in both the case is set to cuda and I am running it on a decive with NVIDIA graphic card which has cuda.

Tell me if what I am doing is correct or any changes has to be made? Or is the time taken is expected to be this long?

felixdittrich92 · 2025-08-11T07:29:11Z

felixdittrich92
Aug 11, 2025
Maintainer

Hi @Vishnu280412 👋

This class is initialized one time in a fastapi lifespan so before the API is up ?

Otherwise it will be re-initialized every time.

Additional you can try to run with half precision or try if the compiled models are faster (https://mindee.github.io/doctr/using_doctr/using_model_export.html#compiling-your-models-pytorch-only)

        self.model = ocr_predictor(
            det_arch=det_model, reco_arch=reco_model, pretrained=False).to(COMPUTE_DEVICE).half()

2 replies

Vishnu280412 Aug 11, 2025
Author

Hello @felixdittrich92 👋,
Thanks for the response, and yes it is being initialised only once throughout the fastapi lifespan and I am using that object to run the model. So I wanted to know if I am loading the model on GPU correctly. So if doing this is enough:

reco_model = crnn_vgg16_bn(
            pretrained=False, pretrained_backbone=False, vocab=custom_vocab)
reco_model.from_pretrained(
            DOCTR_RECOG_MODEL_PATH, map_location=COMPUTE_DEVICE, weights_only=True)
self.model = ocr_predictor(
            det_arch=det_model, reco_arch=reco_model, pretrained=False).to(COMPUTE_DEVICE).half()

or I'll also have to all .to(COMPUTE_DEVICE to both det_model and reco_model?

I'll try to run with half precision and also try compiling the models and using it now.

Thank you.

felixdittrich92 Aug 11, 2025
Maintainer

It's correct ocr_predictor is enough 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need help with running inference faster in GPU #1999

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Need help with running inference faster in GPU #1999

Uh oh!

Uh oh!

Vishnu280412 Aug 6, 2025

Replies: 1 comment · 2 replies

Uh oh!

felixdittrich92 Aug 11, 2025 Maintainer

Uh oh!

Vishnu280412 Aug 11, 2025 Author

Uh oh!

felixdittrich92 Aug 11, 2025 Maintainer

Vishnu280412
Aug 6, 2025

Replies: 1 comment 2 replies

felixdittrich92
Aug 11, 2025
Maintainer

Vishnu280412 Aug 11, 2025
Author

felixdittrich92 Aug 11, 2025
Maintainer