Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

winskuo-quic · 2024-10-15T08:57:38Z

Summary

ConvertToLinear() is redundant in qnn_preprocess.py since this pass is already called in executorch/backends/qualcomm/utils/utils.py
Some models are experiencing a significant drop in accuracy, with a few models having 0% accuracy. Adding new conditions to perform requantization and change ptq_per_channel_quant_config's IO from MinMaxObserver to MovingAverageMinMaxObserver to resolve the issue.

Why adding new conditions to do requantization? We noticed this change in PyTorch PR (pytorch/pytorch@b8eef50#diff-976c3b0c6f85048d3db01a0c394ce8eb16e2f7541f0983d0f4ef549baa4be822L152). Before this PR, quantization spec only checks whether 2 qspecs were same by comparing dtype and is_dynamic. After this change, it checks for more attributes such as scale, zero_point, etc. This causes some nodes having an extra pair of QDQ nodes. As shown in the image below, there are 2 pairs of QDQ nodes after the PyTorch PR, and these 2 pairs of QDQ nodes have different scale and offset. For QNN lowering process, node will only save the quant info right after the node output. For example, cat op below will use quantize_per_tensor_default_18's scale and offset as the node's quant attribute, and all other quant and dequant nodes will be ignored.
This causes an accuracy drop, but by inserting a requantize node, we can see an improvement in accuracy for most models. Taking inceptionv3 as an example, the average top1 accuracy 0%->~75%. I have checked a couple other models and see accuracy either stays the same or have improvements.

I have also provided the option for users to skip this requant optimization if they preferred not to use it.

Before:

After

Why change ptq_per_channel_quant_config's IO from MinMaxObserver to MovingAverageMinMaxObserver?
After the above change, it seems like there is an inference speed drop due to requantization. By switching to MovingAverageMinMaxObserver, I observed an improvement in inference speed for some models such as inceptionv3.

pytorch-bot · 2024-10-15T08:57:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6225

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit acc6f6d with merge base 1f2b9aa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

winskuo-quic · 2024-10-15T09:00:31Z

Hi @cccclai,
This PR consists of 2 updates:

Removing an unused pass in qnn_preprocess.
Mainline has significant accuracy drop for some models.
Please refer to the summary section above for more info.
Thanks

cccclai · 2024-10-15T17:36:59Z

Hi, thanks for sending the fix PR!

We discussed internally regarding the index put node (#4627), and realized that these non-compute node ideally shouldn't be annotated, because they just move data around, and didn't do actual compute. Annotating them may cause a little bit regression (maybe not too much), but ideally, we can annotate computation operator only.

I think it's better to have this PR in, given that some model accuracy drops to 0, but still better to resolve the above issue, and hopefully it helps the index put node issue

cccclai

Left the comment in the github pr

facebook-github-bot · 2024-10-15T18:17:25Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

winskuo-quic · 2024-10-16T13:03:20Z

Hi, thanks for sending the fix PR!

We discussed internally regarding the index put node (#4627), and realized that these non-compute node ideally shouldn't be annotated, because they just move data around, and didn't do actual compute. Annotating them may cause a little bit regression (maybe not too much), but ideally, we can annotate computation operator only.

I think it's better to have this PR in, given that some model accuracy drops to 0, but still better to resolve the above issue, and hopefully it helps the index put node issue

Thanks for reviewing and the feedback!
I think this PR will not have a direct impact on the issue you mentioned, as the issue this PR is fixing is focusing on requantization. However, I will keep an eye on the issue you mentioned and will see if it can be resolved.

facebook-github-bot · 2024-10-16T18:28:50Z

@cccclai merged this pull request in dc4be7c.

Observer Fix and remove unused passes

acc6f6d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024

cccclai approved these changes Oct 15, 2024

View reviewed changes

facebook-github-bot closed this in dc4be7c Oct 16, 2024

facebook-github-bot added the Merged label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

Uh oh!

winskuo-quic commented Oct 15, 2024

Uh oh!

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading

Uh oh!

winskuo-quic commented Oct 15, 2024

Uh oh!

cccclai commented Oct 15, 2024

Uh oh!

cccclai left a comment

Uh oh!

facebook-github-bot commented Oct 15, 2024

Uh oh!

winskuo-quic commented Oct 16, 2024

Uh oh!

facebook-github-bot commented Oct 16, 2024

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

Uh oh!

Conversation

winskuo-quic commented Oct 15, 2024

Summary

Uh oh!

pytorch-bot bot commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6225

✅ No Failures

Uh oh!

winskuo-quic commented Oct 15, 2024

Uh oh!

cccclai commented Oct 15, 2024

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 15, 2024

Uh oh!

winskuo-quic commented Oct 16, 2024

Uh oh!

facebook-github-bot commented Oct 16, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading