-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Description
I'm trying building a datastore for Qwen2.5 series models using the DraftRetriever but encountered a Segmentation fault (core dumped)
error when calling writer.finalize()
in script get_datastore_chat.py
Line 54. The dataset I used is ShareGPT_Vicuna_unfiltered
, the same as the default option.
I'm using "llama3" branch (as it fixed the vocabulary size limit) with python3.9 and the prebuilt wheel. I'm not familiar with Rust, so I will be sincerely appreciated if there would be someone help me out.
For reproduce:
Just modify the get_datastore_chat.py
Line 13, Line 45 and run it with no arguments.
parser.add_argument(
"--model-path",
type=str,
default="Qwen/Qwen2.5-0.5B-Instruct",
# default="lmsys/vicuna-7b-v1.5",
help="The path to the weights. This can be a local folder or a Hugging Face repo ID.",
)
Line 45
dataset_path = 'path/to/datasets/ShareGPT_Vicuna_unfiltered/ShareGPT_2023.05.04v0_Wasteland_Edition.json'
Output:
number of samples: 52052
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 52052/52052 [05:09<00:00, 167.97it/s]
Segmentation fault (core dumped)
Thanks a lot.
Metadata
Metadata
Assignees
Labels
No labels