Skip to content

Commit f7620fe

Browse files
committed
improve language
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
1 parent 6ac3880 commit f7620fe

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

torchao/quantization/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
Typically quantization algorithms will have different schemes for how the activation and weights are quantized so A16W8 for instance means the activations are quantized to 16 bits wheras the weights are quantized to 8 bits. Trying out different quantization schemes in `torchao` is generally a 1 line change. Note: exact APIs are not stable, we may change them in the future.
33

44
## Benchmarks
5-
Benchmarks are run on a machine with a single A100 GPU using the script in _models/llama, evaluation was done
6-
Using the lm_eval. The models used were meta-llama/Llama-2-7b-chat-hf and meta-llama/Meta-Llama-3-8B benchmarked for batchsize=1
5+
Benchmarks are run on a machine with a single A100 GPU using the script in _models/llama which generates text in a latency optimized way (batchsize=1), evaluation was done
6+
Using the lm_eval. The models used were meta-llama/Llama-2-7b-chat-hf and meta-llama/Meta-Llama-3-8B.
77

88
| Model | Technique | wikitext-perplexity | Tokens/Second | Memory Bandwidth (GB/s) | Peak Memory (GB) | Model Size (GB) |
99
| ----------- | ------------------ | ------------------- | ------------- | ----------------------- | ---------------- | --------------- |

0 commit comments

Comments
 (0)