TensorRT & Flux Dev #4484
-
@comfyanonymous What am I doing wrong? TensorRT doesn't support 16gb VRAM? Thank you in advance |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 11 replies
-
can someone please help? |
Beta Was this translation helpful? Give feedback.
-
TensorRT needs more than 24GB vram at the moment to convert a Flux model, even a 4090 isn't enough. |
Beta Was this translation helpful? Give feedback.
-
couldn't split it to chunks and save it , in loop for gpu and cpu |
Beta Was this translation helpful? Give feedback.
-
I am going to get so mad if Nvidia doesn't start putting out affordable 48Gb cards soon come on even a 4090 ti with 36gb vram 20.000 tensors would be great. |
Beta Was this translation helpful? Give feedback.
-
I could convert it at my workplace but if I remember correctly the TRT will be specific to the GPU I used and not portable to another? I tried it with the schnell fp8 checkpoint and I run into an error: [09/24/2024-07:07:26] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (Networks with BF16 precision require hardwar e with BF16 support.) |
Beta Was this translation helpful? Give feedback.
-
You can try converting the Flux model using a ada 6000 and then run the engine with a 4090. However this only works if nothing else is running on the 4090. This means no monitor plugged into it and no applications running on it. Alternatively you can try fp8 + --fast + the torch compile node. |
Beta Was this translation helpful? Give feedback.
-
I compiled flux with tensorrt, using the defaults in the tensorrt node pack, and observed better performance*.
On RTX A5000, pytorch 2.4.1, tensorrt 10.5.0 Windows |
Beta Was this translation helpful? Give feedback.
-
I am trying to convert the model but am still running into issues, mostly because it seems like torch.bfloat16 results in a compiling error: BackendCompilerFailed: backend='tensorrt' raised:
TypeError: Unsupported numpy dtype
While executing %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%_to_copy, 1000), kwargs = {_itensor_to_tensor_meta: {<tensorrt_bindings.tensorrt.ITensor object at 0x7881fc106c30>: ((3072, 384), torch.bfloat16, True, (384, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7881f4b48db0>: ((384, 3072), torch.bfloat16, False, (1, 384), None, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d0930>: ((1, 3456, 384), torch.bfloat16, False, (1327104, 384, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d0db0>: ((3456, 384), torch.bfloat16, False, (384, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d2530>: ((3072,), torch.bfloat16, True, (1,), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7881f4b8d5b0>: ((3456, 3072), torch.bfloat16, False, (3072, 1), torch.contiguous_format, False, {}), <tensorrt_bindings.tensorrt.ITensor object at 0x7886408d3530>: ((1,), torch.bfloat16, False, (1,), torch.contiguous_format, False, {})}})
Original traceback:
None
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True |
Beta Was this translation helpful? Give feedback.
-
so this is all bullshit then? |
Beta Was this translation helpful? Give feedback.
TensorRT needs more than 24GB vram at the moment to convert a Flux model, even a 4090 isn't enough.