TensorRT & Flux Dev #4484

Woukim · 2024-08-19T13:40:51Z

Woukim
Aug 19, 2024

@comfyanonymous
Sorry for the ping, but maybe you can help figure it out?” I have a RTX 4080 Super with 16gb VRAM, 64gb RAM.
I've tried different weights, also tried the Q8.gguf version, all of which resulted in errors.

What am I doing wrong? TensorRT doesn't support 16gb VRAM? Thank you in advance

Answered by comfyanonymous

Aug 20, 2024

TensorRT needs more than 24GB vram at the moment to convert a Flux model, even a 4090 isn't enough.

View full answer

Woukim · 2024-08-20T14:59:38Z

Woukim
Aug 20, 2024
Author

can someone please help?

1 reply

shammyfiveducks Aug 30, 2024

As Flux is not available in this program to train what model are you selecting to train flux (before you get the error)

comfyanonymous · 2024-08-20T15:08:22Z

comfyanonymous
Aug 20, 2024
Maintainer

TensorRT needs more than 24GB vram at the moment to convert a Flux model, even a 4090 isn't enough.

2 replies

Woukim Aug 20, 2024
Author

Thank you for taking the time to respond, it makes sense now

J-Cott Oct 18, 2024

You can put ComfyUI into CPU mode and do the converting then, but you still can't run it on a 24GB card as the .engine file is 22GB and in total it needs around 28GB when ComfyUI has loaded its other bits.

al-swaiti · 2024-08-22T20:12:31Z

al-swaiti
Aug 22, 2024

couldn't split it to chunks and save it , in loop for gpu and cpu

0 replies

BrechtCorbeel · 2024-08-23T06:15:31Z

BrechtCorbeel
Aug 23, 2024

I am going to get so mad if Nvidia doesn't start putting out affordable 48Gb cards soon come on even a 4090 ti with 36gb vram 20.000 tensors would be great.

0 replies

drake7707 · 2024-09-24T07:11:58Z

drake7707
Sep 24, 2024

I could convert it at my workplace but if I remember correctly the TRT will be specific to the GPU I used and not portable to another?
Specs of the server I could use:
2024-09-24 07:03:48,199 - root - INFO - Total VRAM 45548 MB, total RAM 257517 MB
2024-09-24 07:03:48,199 - root - INFO - pytorch version: 2.4.0
2024-09-24 07:03:49,386 - root - INFO - xformers version: 0.0.27.post2
2024-09-24 07:03:49,401 - root - INFO - Device: cuda:0 Quadro RTX 8000

I tried it with the schnell fp8 checkpoint and I run into an error:

[09/24/2024-07:07:26] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (Networks with BF16 precision require hardwar e with BF16 support.)
!!! Exception during processing !!! a bytes-like object is required, not 'NoneType'

0 replies

comfyanonymous · 2024-09-24T07:23:20Z

comfyanonymous
Sep 24, 2024
Maintainer

You can try converting the Flux model using a ada 6000 and then run the engine with a 4090. However this only works if nothing else is running on the 4090. This means no monitor plugged into it and no applications running on it.

Alternatively you can try fp8 + --fast + the torch compile node.

1 reply

al-swaiti Sep 24, 2024

Hey again , I faced before many issues of cuda out of memory plus out of ram , during quantization I was fixing that by save result model by chunks, I think you can apply this for tensor rt , plus missed u last week's hope u okay 👍

doctorpangloss · 2024-10-05T03:39:22Z

doctorpangloss
Oct 5, 2024

I compiled flux with tensorrt, using the defaults in the tensorrt node pack, and observed better performance*.

runtime	perf
pytorch	1.22s/it
tensorrt	1.06s/it

On RTX A5000, pytorch 2.4.1, tensorrt 10.5.0 Windows
I obviously have been misreading the secs per it line! So there's some promise here. There are a few tweaks that I needed to fix the tracing warnings.

5 replies

DuckersMcQuack Oct 20, 2024

I compiled flux with tensorrt, using the defaults in the tensorrt node pack

What's the process of doing this? As i want to try that on my 3090 to see how much the gain will be.

doctorpangloss Oct 21, 2024

you create the workflow then run the node.
the gain is going to be exactly the same as mine. it is similar to using wsl 2 with torch.compile.

DuckersMcQuack Nov 2, 2024

A5000

Could you share the TRT model? Want to try running it on my 3090.

yachty66 Mar 16, 2025

Could you please share the script you used for the conversion?

dsphotoblog Jul 5, 2025

hi, please share the model with us!

yachty66 · 2025-03-18T00:08:08Z

yachty66
Mar 18, 2025

I am trying to convert the model but am still running into issues, mostly because it seems like torch.bfloat16 results in a compiling error:

BackendCompilerFailed: backend='tensorrt' raised:
TypeError: Unsupported numpy dtype

While executing %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%_to_copy, 1000), kwargs = {_itensor_to_tensor_meta: {<tensorrt_bindings.tensorrt.ITensor object at 0x7881fc106c30>: ((3072, 384), torch.bfloat16, True, (384, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7881f4b48db0>: ((384, 3072), torch.bfloat16, False, (1, 384), None, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d0930>: ((1, 3456, 384), torch.bfloat16, False, (1327104, 384, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d0db0>: ((3456, 384), torch.bfloat16, False, (384, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d2530>: ((3072,), torch.bfloat16, True, (1,), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7881f4b8d5b0>: ((3456, 3072), torch.bfloat16, False, (3072, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d3530>: ((1,), torch.bfloat16, False, (1,), torch.contiguous_format, False, {})}})
Original traceback:
None

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

1 reply

zzlin-0629 May 22, 2025

Any solution? I met the same errror

Wontfallo · 2025-08-03T01:03:08Z

Wontfallo
Aug 3, 2025

so this is all bullshit then?
https://developer.nvidia.com/blog/optimizing-flux-1-kontext-for-image-editing-with-low-precision-quantization
worthless 4090 cool Nvidia just a bunch of marketing clowns

0 replies

TensorRT & Flux Dev #4484

Uh oh!

Replies: 9 comments · 11 replies

Uh oh!

Woukim Aug 20, 2024 Author

Uh oh!

Uh oh!

comfyanonymous Aug 20, 2024 Maintainer

Uh oh!

Woukim Aug 20, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comfyanonymous Sep 24, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 9 comments 11 replies

Woukim
Aug 20, 2024
Author

comfyanonymous
Aug 20, 2024
Maintainer

Woukim Aug 20, 2024
Author

comfyanonymous
Sep 24, 2024
Maintainer