Skip to content

Llama3 Branch Still Suffers Segmentation Fault When Generating Datastore Using Qwen2.5 #26

@csAugust

Description

@csAugust

I'm trying building a datastore for Qwen2.5 series models using the DraftRetriever but encountered a Segmentation fault (core dumped) error when calling writer.finalize() in script get_datastore_chat.py Line 54. The dataset I used is ShareGPT_Vicuna_unfiltered, the same as the default option.

I'm using "llama3" branch (as it fixed the vocabulary size limit) with python3.9 and the prebuilt wheel. I'm not familiar with Rust, so I will be sincerely appreciated if there would be someone help me out.

For reproduce:
Just modify the get_datastore_chat.py Line 13, Line 45 and run it with no arguments.

parser.add_argument(
    "--model-path",
    type=str,
    default="Qwen/Qwen2.5-0.5B-Instruct",
    # default="lmsys/vicuna-7b-v1.5",
    help="The path to the weights. This can be a local folder or a Hugging Face repo ID.",
)

Line 45

    dataset_path = 'path/to/datasets/ShareGPT_Vicuna_unfiltered/ShareGPT_2023.05.04v0_Wasteland_Edition.json'

Output:

number of samples:  52052
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 52052/52052 [05:09<00:00, 167.97it/s]
Segmentation fault (core dumped)

Thanks a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions