Quantized Inference on Generative LLMs (QIGen)

Code generator for inference on Quantized Large Language Models. Quantization is done using GPTQ.

Current features

Support for LlaMA and OPT
4,3, and 2 bit inference
x86 with AVX2 support
Support for pyTorch and transformers
Support for generic quantization group size

TODOs

Support for ARM Neon
Support for AVX512
Including quantization error analysis in code generation

Usage

Installation

Install dependencies via pip install -r requirements.txt
Install transformers from source pip install git+https://github.com/huggingface/transformers
Install the python module python setup.py install. This will run a search to find the best parameters for register usage.

Example

We give an example notebook in demo.ipynb. The basic workflow is

load floating point model,
load quantized checkpoint from GPTQ,
call the infergen.swap_modules_llama(model, quantized_checkpoint, bits=4, p=64, l1=l1, inplace=False) function, where model is the full-size model, quantized_checkpoint is the quantized model, bits is the number of bits used for the quantization,l1 is the size of the l1 data cache in bits, p is the number of cores to use, and inplace is a flag to swap in place or creating a copy.
Use the quantized model as a normal transformer.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
QIGen.py		QIGen.py
README.md		README.md
demo.ipynb		demo.ipynb
generate.py		generate.py
intrin.py		intrin.py
mmm.cpp		mmm.cpp
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quantized Inference on Generative LLMs (QIGen)

Current features

TODOs

Usage

Installation

Example

About

Uh oh!

Releases

Packages

Uh oh!

Languages

IST-DASLab/QIGen

Folders and files

Latest commit

History

Repository files navigation

Quantized Inference on Generative LLMs (QIGen)

Current features

TODOs

Usage

Installation

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages