-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
Description
@ngxson Thank you for your new feature. After try new mtmd, I have a few problems.
Name and Version
My llama version:
$./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5517 (1e8659e)
built with cc (Ubuntu 11.4.0-2ubuntu1~20.04) 11.4.0 for x86_64-linux-gnu
elsehere only check model has audio encoder, if not, the convert process will failed.
llama.cpp/convert_hf_to_gguf.py
Lines 1181 to 1211 in 26b79b6
| def set_gguf_parameters(self): | |
| self.gguf_writer.add_file_type(self.ftype) | |
| if self.has_vision_encoder: | |
| self.gguf_writer.add_clip_has_vision_encoder(True) | |
| self.gguf_writer.add_vision_projection_dim(self.n_embd_text) | |
| # vision config | |
| self.gguf_writer.add_vision_image_size(self.find_vparam(["image_size"])) | |
| self.gguf_writer.add_vision_patch_size(self.find_vparam(["patch_size"])) | |
| self.gguf_writer.add_vision_embedding_length(self.find_vparam(["hidden_size"])) | |
| self.gguf_writer.add_vision_feed_forward_length(self.find_vparam(["intermediate_size"])) | |
| self.gguf_writer.add_vision_block_count(self.find_vparam(self.n_block_keys)) | |
| self.gguf_writer.add_vision_head_count(self.find_vparam(["num_attention_heads"])) | |
| # preprocessor config | |
| self.gguf_writer.add_vision_image_mean(self.preprocessor_config["image_mean"]) | |
| self.gguf_writer.add_vision_image_std(self.preprocessor_config["image_std"]) | |
| if self.has_audio_encoder: | |
| self.gguf_writer.add_clip_has_audio_encoder(True) | |
| self.gguf_writer.add_audio_projection_dim(self.n_embd_text) | |
| # audio config | |
| self.gguf_writer.add_audio_embedding_length(self.find_aparam(["hidden_size"])) | |
| self.gguf_writer.add_audio_feed_forward_length(self.find_aparam(["intermediate_size"])) | |
| self.gguf_writer.add_audio_block_count(self.find_aparam(self.n_block_keys)) | |
| self.gguf_writer.add_audio_head_count(self.find_aparam(["num_attention_heads"])) | |
| else: | |
| raise ValueError("MmprojModel must have either vision or audio encoder") |
- When I try to quantize mmproj of Qwen2-VL, it failed with error message "unknown model architecture: 'clip'"
./llama-quantize qwen2.5-vl/mmproj-Qwen2-VL-2B-Instruct F16
main: build = 5517 (1e8659e6)
main: built with cc (Ubuntu 11.4.0-2ubuntu1~20.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/media/wqq/ext4/datasets/qwen2.5-vl/mmproj-Qwen2-VL-2B-Instruct' to '/media/wqq/ext4/datasets/qwen2.5-vl/ggml-model-F16.gguf' as F16
llama_model_loader: loaded meta data with 27 key-value pairs and 520 tensors from /media/wqq/ext4/datasets/qwen2.5-vl/mmproj-Qwen2-VL-2B-Instruct (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = clip
llama_model_loader: - kv 1: general.type str = mmproj
llama_model_loader: - kv 2: general.name str = Qwen2 VL 2B Instruct
llama_model_loader: - kv 3: general.finetune str = 2b-Instruct
llama_model_loader: - kv 4: general.basename str = Qwen2-VL
llama_model_loader: - kv 5: general.size_label str = 665M
llama_model_loader: - kv 6: general.license str = apache-2.0
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2 VL 2B
llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2-VL-2B
llama_model_loader: - kv 11: general.tags arr[str,2] = ["multimodal", "image-text-to-text"]
llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 13: general.file_type u32 = 1
llama_model_loader: - kv 14: clip.has_vision_encoder bool = true
llama_model_loader: - kv 15: clip.vision.projection_dim u32 = 1536
llama_model_loader: - kv 16: clip.vision.image_size u32 = 560
llama_model_loader: - kv 17: clip.vision.patch_size u32 = 14
llama_model_loader: - kv 18: clip.vision.embedding_length u32 = 1280
llama_model_loader: - kv 19: clip.vision.feed_forward_length u32 = 1536
llama_model_loader: - kv 20: clip.vision.block_count u32 = 32
llama_model_loader: - kv 21: clip.vision.attention.head_count u32 = 16
llama_model_loader: - kv 22: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
llama_model_loader: - kv 23: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
llama_model_loader: - kv 24: clip.projector_type str = qwen2vl_merger
llama_model_loader: - kv 25: clip.vision.attention.layer_norm_epsilon f32 = 0.000001
llama_model_loader: - kv 26: general.quantization_version u32 = 2
llama_model_loader: - type f32: 324 tensors
llama_model_loader: - type f16: 196 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
llama_model_quantize: failed to quantize: unknown model architecture: 'clip'
- llama_server can simultaneously load and switch between different LoRAs. I'm wondering, can each LoRA include mmproj?
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA GeForce RTX 3060
Models
Qwen2-VL-2B-Instruct