-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
Just wanted to report that this works perfect on my gtx1060 (6gb) on my old i5-7200 16gb ram under win10. So far, i never reached such a speed with all other existing solution ( oobabooga, textsynth, llama.cpp). No single issue during install. I can't tell exactly but it's surely a couple of tokens / sec during inference. Need more deep dive to get a feeling of quality as it seems to be quantized model in int3 ?
Now, we want more : more models, 13b size, parameter access (temp, topp etc) and api. Anyway, i think this is great work already !
junrushao, LYJSPEEDX and shiqimei
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request