-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Current Behavior
Crash when I enable two GPU cards. Gabage output when I enable one GPU card.
Two Cards:
PS C:\gpt\llama.cpp> .\build\bin\RelWithDebInfo\main.exe -m ..\en-models\7B\ggml-alpaca-7b-q4.bin -p "what is cuda?" -ngl 40
main: build = 635 (5c64a09)
main: seed = 1686202333
ggml_init_cublas: found 2 CUDA devices:
Device 0: Tesla P100-PCIE-16GB
Device 1: NVIDIA GeForce GTX 1070
llama.cpp: loading model from ..\en-models\7B\ggml-alpaca-7b-q4.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.07 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (Tesla P100-PCIE-16GB) as main device
llama_model_load_internal: mem required = 1932.71 MB (+ 1026.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 layers to GPU
llama_model_load_internal: offloading output layer to GPU
llama_model_load_internal: total VRAM used: 3987 MB
...................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
what is cuda?CUDA error 9 at C:\GPT\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument
FNsi
Metadata
Metadata
Assignees
Labels
No labels