Skip to content

Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

winskuo-quic
Copy link
Collaborator

Summary

  • ConvertToLinear() is redundant in qnn_preprocess.py since this pass is already called in executorch/backends/qualcomm/utils/utils.py

  • Some models are experiencing a significant drop in accuracy, with a few models having 0% accuracy. Adding new conditions to perform requantization and change ptq_per_channel_quant_config's IO from MinMaxObserver to MovingAverageMinMaxObserver to resolve the issue.

  1. Why adding new conditions to do requantization? We noticed this change in PyTorch PR (pytorch/pytorch@b8eef50#diff-976c3b0c6f85048d3db01a0c394ce8eb16e2f7541f0983d0f4ef549baa4be822L152). Before this PR, quantization spec only checks whether 2 qspecs were same by comparing dtype and is_dynamic. After this change, it checks for more attributes such as scale, zero_point, etc. This causes some nodes having an extra pair of QDQ nodes. As shown in the image below, there are 2 pairs of QDQ nodes after the PyTorch PR, and these 2 pairs of QDQ nodes have different scale and offset. For QNN lowering process, node will only save the quant info right after the node output. For example, cat op below will use quantize_per_tensor_default_18's scale and offset as the node's quant attribute, and all other quant and dequant nodes will be ignored.
    This causes an accuracy drop, but by inserting a requantize node, we can see an improvement in accuracy for most models. Taking inceptionv3 as an example, the average top1 accuracy 0%->~75%. I have checked a couple other models and see accuracy either stays the same or have improvements.

I have also provided the option for users to skip this requant optimization if they preferred not to use it.

Before:
image


After
image

  1. Why change ptq_per_channel_quant_config's IO from MinMaxObserver to MovingAverageMinMaxObserver?
    After the above change, it seems like there is an inference speed drop due to requantization. By switching to MovingAverageMinMaxObserver, I observed an improvement in inference speed for some models such as inceptionv3.

Copy link

pytorch-bot bot commented Oct 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6225

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit acc6f6d with merge base 1f2b9aa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024
@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR consists of 2 updates:

  1. Removing an unused pass in qnn_preprocess.
  2. Mainline has significant accuracy drop for some models.
    Please refer to the summary section above for more info.
    Thanks

@cccclai
Copy link
Contributor

cccclai commented Oct 15, 2024

Hi, thanks for sending the fix PR!

We discussed internally regarding the index put node (#4627), and realized that these non-compute node ideally shouldn't be annotated, because they just move data around, and didn't do actual compute. Annotating them may cause a little bit regression (maybe not too much), but ideally, we can annotate computation operator only.

I think it's better to have this PR in, given that some model accuracy drops to 0, but still better to resolve the above issue, and hopefully it helps the index put node issue

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left the comment in the github pr

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@winskuo-quic
Copy link
Collaborator Author

Hi, thanks for sending the fix PR!

We discussed internally regarding the index put node (#4627), and realized that these non-compute node ideally shouldn't be annotated, because they just move data around, and didn't do actual compute. Annotating them may cause a little bit regression (maybe not too much), but ideally, we can annotate computation operator only.

I think it's better to have this PR in, given that some model accuracy drops to 0, but still better to resolve the above issue, and hopefully it helps the index put node issue

Thanks for reviewing and the feedback!
I think this PR will not have a direct impact on the issue you mentioned, as the issue this PR is fixing is focusing on requantization. However, I will keep an eye on the issue you mentioned and will see if it can be resolved.

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in dc4be7c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants