[NPU] Add `mixed_precision` for Qwen2 7B #12098

Oscilloscope98 · 2024-09-20T06:54:37Z

Description

https://github.com/analytics-zoo/nano/issues/1633#issuecomment-2363009566

Support mixed_precision in from_pretrained function for NPU

If mixed_precision=True and load_in_low_bit='sym_int4', Qwen2 7B will use INT8 for lm_head
Model saved with mixed_precision=True/False will keep the same option when load_low_bit the saved model
Disable lm_head split when load_in_low_bit='sym_int8'
Update example accordingly

…en2-7B-Instruct

jason-dai

LGTM

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md

rnwang04

others LGTM

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md

Oscilloscope98 added 6 commits September 20, 2024 11:46

Add mix_precision argument to control whether use INT8 lm_head for Qw…

e9d6ba1

…en2-7B-Instruct

Small fix

7157cc7

Fixed on load low bit with mixed precision

3d1def8

Small fix

0dada1f

Update example accordingly

43164d1

Update for default prompt

5a17b67

Oscilloscope98 requested review from jason-dai and rnwang04 September 20, 2024 07:23

jason-dai approved these changes Sep 20, 2024

View reviewed changes

rnwang04 reviewed Sep 20, 2024

View reviewed changes

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md Outdated Show resolved Hide resolved

Update base on comments

9ab579a

rnwang04 approved these changes Sep 20, 2024

View reviewed changes

rnwang04 reviewed Sep 20, 2024

View reviewed changes

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md Outdated Show resolved Hide resolved

Final fix

c29eb59

Oscilloscope98 merged commit 828fa01 into intel:main Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NPU] Add `mixed_precision` for Qwen2 7B #12098

[NPU] Add `mixed_precision` for Qwen2 7B #12098

Oscilloscope98 commented Sep 20, 2024 •

edited

Loading

Uh oh!

jason-dai left a comment

Uh oh!

Uh oh!

rnwang04 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NPU] Add mixed_precision for Qwen2 7B #12098

[NPU] Add mixed_precision for Qwen2 7B #12098

Conversation

Oscilloscope98 commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

jason-dai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rnwang04 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NPU] Add `mixed_precision` for Qwen2 7B #12098

[NPU] Add `mixed_precision` for Qwen2 7B #12098

Oscilloscope98 commented Sep 20, 2024 •

edited

Loading