Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?


I noticed that int8 quantization is available, but is there an option for int4 quantization?
What is the memory overhead for int4 and int8 when using LoRA or PTuning fine-tuning? 
Additionally, are there inference scripts available for int4 quantization? How much GPU memory is required for int4 and int8 inference, respectively?