Skip to content

Conversation

@Oscilloscope98
Copy link
Contributor

@Oscilloscope98 Oscilloscope98 commented Sep 20, 2024

Description

https://github.com/analytics-zoo/nano/issues/1633#issuecomment-2363009566

Support mixed_precision in from_pretrained function for NPU

  • If mixed_precision=True and load_in_low_bit='sym_int4', Qwen2 7B will use INT8 for lm_head
  • Model saved with mixed_precision=True/False will keep the same option when load_low_bit the saved model
  • Disable lm_head split when load_in_low_bit='sym_int8'
  • Update example accordingly

Copy link
Contributor

@jason-dai jason-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@rnwang04 rnwang04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others LGTM

@Oscilloscope98 Oscilloscope98 merged commit 828fa01 into intel:main Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants