-
Notifications
You must be signed in to change notification settings - Fork 317
Remove preserve_zero and zero_point_domain from choose_qparams_affine #2149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2149
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New FailuresAs of commit 214e704 with merge base 212d912 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
85936a5
to
9780257
Compare
@@ -255,7 +254,7 @@ def get_plain(self) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: | |||
target_dtype = torch.int32 | |||
quant_min = 0 | |||
quant_max = 15 | |||
zero_point_domain = ZeroPointDomain.FLOAT | |||
# zero_point_domain is ZeroPointDomain.FLOAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to keep it for now, as it can be an indicator of previous implementation.
@@ -1025,7 +1025,6 @@ def get_per_token_block_size(x): | |||
block_size=block_size, | |||
target_dtype=target_dtype, | |||
_layout=_layout, | |||
scale_dtype=torch.float32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks @jainapurva for carefully working through this!
zero_point_domain is optional specifies how we quantize the floating point to quantized data: | ||
INT: quantized_val = (float_val / scale) (integer) + zero_point (integer) | ||
FLOAT: quantized_val = (float_val - (zero_point (float) - scale * mid_point)) / scale | ||
None: quantized_val = (float_val / scale) | this is primarily used for floatx quantization | ||
Where we do not want to round values to nearest integer and instead scale and cast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can just leave the one that is relevant
raise ValueError("Please use ZeroPointDomain.NONE instead of None") | ||
elif zero_point_domain is ZeroPointDomain.NONE and zero_point is not None: | ||
raise ValueError("zero_point should be None when zero_point_domain is NONE") | ||
# if zero_point_domain is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please remove the commented code before landing
quant_max: Union[int, float], | ||
output_dtype: torch.dtype = torch.float32, | ||
) -> torch.Tensor: | ||
"""This function converts AQT tensors to their high precision floating point representation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we only have doc for non-private helper functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs has to be updated a bit, commented inline
This pull request focuses on refactoring and simplifying quantization-related code by removing unused or redundant functionality and introducing specialized methods for handling specific cases. The most important changes include removing the
preserve_zero
andzero_point_domain
parameters from many functions, introducing new specialized quantization and dequantization methods, and modifying use-cases accordingly.Refactoring and Simplification:
preserve_zero
andzero_point_domain
parameters fromchoose_qparams_affine
,quantize_affine
, anddequantize_affine
calls across multiple files, while introducing specialized methods to handle specific quantization scenarios.The following table contains the new methods:
Notable updates related to the changes:
The following list contains AOBaseConfigs, along with the corresponding choose_qparams_affine function calls made by the backend for each configuration: