Llama3 Branch Still Suffers Segmentation Fault When Generating Datastore Using Qwen2.5

I'm trying building a datastore for Qwen2.5 series models using the DraftRetriever but encountered a `Segmentation fault (core dumped)` error when calling `writer.finalize()` in script `get_datastore_chat.py` Line 54. The dataset I used is `ShareGPT_Vicuna_unfiltered`, the same as the default option.

I'm using "llama3" branch (as it fixed the vocabulary size limit) with python3.9 and the prebuilt wheel. I'm not familiar with Rust, so I will be sincerely appreciated if there would be someone help me out.

For reproduce:
Just modify the `get_datastore_chat.py` Line 13, Line 45 and run it with no arguments.
```
parser.add_argument(
    "--model-path",
    type=str,
    default="Qwen/Qwen2.5-0.5B-Instruct",
    # default="lmsys/vicuna-7b-v1.5",
    help="The path to the weights. This can be a local folder or a Hugging Face repo ID.",
)
```
Line 45
```
    dataset_path = 'path/to/datasets/ShareGPT_Vicuna_unfiltered/ShareGPT_2023.05.04v0_Wasteland_Edition.json'
```

Output:
```
number of samples:  52052
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 52052/52052 [05:09<00:00, 167.97it/s]
Segmentation fault (core dumped)
```

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama3 Branch Still Suffers Segmentation Fault When Generating Datastore Using Qwen2.5 #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama3 Branch Still Suffers Segmentation Fault When Generating Datastore Using Qwen2.5 #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions