- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.4k
Closed
Labels
Description
Model: OpenCodeInterpreter-DS-6.7B (GGUFs)
This is a deepseek coder instruct-based model, llama arch, but maybe there's something distinct for it that requires special-handling?
Or maybe I did something wrong in converting these files from the original safetensors (used the same build, b2249, for converting, quantizing, and running).
Both -ngl=999 & -ngl=0 produce the same exception:
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
llama.cpp build info
- b2249(rev:- 15499eb94227401bdc8875da6eb85c15d37068f7)
- compiled with LLAMA_METAL=1
- macOS M1 Pro
lldb stacktrace
Process 25487 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
libc++abi.dylib`__cxa_throw:
->  0x188223330 <+0>:  pacibsp
    0x188223334 <+4>:  stp    x22, x21, [sp, #-0x30]!
    0x188223338 <+8>:  stp    x20, x19, [sp, #0x10]
    0x18822333c <+12>: stp    x29, x30, [sp, #0x20]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
    frame #1: 0x00000001000684c0 main`std::__1::__throw_out_of_range[abi:v160006](char const*) + 60
    frame #2: 0x000000010006a790 main`llama_byte_to_token(llama_vocab const&, unsigned char) + 472
    frame #3: 0x000000010003d270 main`llama_model_load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, llama_model&, llama_model_params&) + 1968
    frame #4: 0x000000010003ca08 main`llama_load_model_from_file + 420
    frame #5: 0x00000001000a208c main`llama_init_from_gpt_params(gpt_params&) + 96
    frame #6: 0x00000001000ed73c main`main + 2404
    frame #7: 0x0000000187ee90e0 dyld`start + 2360
full lldb output from `./main`:
(lldb) target create "./main"
Current executable set to '/Users/tito/code/llama.cpp/main' (arm64).
(lldb) settings set -- target.run-args  "-m" "/Users/tito/code/autogguf/OpenCodeInterpreter-DS-6.7B/opencodeinterpreter-ds-6.7b.Q4_K_M.gguf" "-t" "7" "--color" "--ctx_size" "4096" "--keep" "4" "--in-prefix" "<|User|>\\n" "--in-suffix" "\\n<|Assistant|>\\n" "-r" "<|User|>" "-r" "<|Assistant|>" "-r" "<|EOT|>" "-ins" "-b" "512" "-n" "-1" "--temp" "0.7" "--repeat_penalty" "1.1" "-ngl" "0"
(lldb) breakpoint set -E C++
Breakpoint 1: no locations (pending).
(lldb) run
Process 25487 launched: '/Users/tito/code/llama.cpp/main' (arm64)
2 locations added to breakpoint 1
Log start
main: build = 2249 (15499eb9)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.3.0
main: seed  = 1708707124
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /Users/tito/code/autogguf/OpenCodeInterpreter-DS-6.7B/opencodeinterpreter-ds-6.7b.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 100000.000000
llama_model_loader: - kv  11:                    llama.rope.scaling.type str              = linear
llama_model_loader: - kv  12:                  llama.rope.scaling.factor f32              = 4.000000
llama_model_loader: - kv  13:                          general.file_type u32              = 15
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32256]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32256]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32256]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 32013
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 32021
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 32014
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {%- set found_item = false -%}\n{%- fo...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
Process 25487 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
libc++abi.dylib`__cxa_throw:
->  0x188223330 <+0>:  pacibsp
    0x188223334 <+4>:  stp    x22, x21, [sp, #-0x30]!
    0x188223338 <+8>:  stp    x20, x19, [sp, #0x10]
    0x18822333c <+12>: stp    x29, x30, [sp, #0x20]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
    frame #1: 0x00000001000684c0 main`std::__1::__throw_out_of_range[abi:v160006](char const*) + 60
    frame #2: 0x000000010006a790 main`llama_byte_to_token(llama_vocab const&, unsigned char) + 472
    frame #3: 0x000000010003d270 main`llama_model_load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, llama_model&, llama_model_params&) + 1968
    frame #4: 0x000000010003ca08 main`llama_load_model_from_file + 420
    frame #5: 0x00000001000a208c main`llama_init_from_gpt_params(gpt_params&) + 96
    frame #6: 0x00000001000ed73c main`main + 2404
    frame #7: 0x0000000187ee90e0 dyld`start + 2360
conversion info
$ python3.11 ./convert.py OpenCodeInterpreter-DS-6.7B \
  --outtype f16 \
  --outfile opencodeinterpreter-ds-6.7b.fp16.gguf \
  --vocab-type hfft \
  --pad-vocab
ensan-hcl