-
Notifications
You must be signed in to change notification settings - Fork 33
[New Question] Attention with Linear Biases (Medium) #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,107 @@ | |||
| <p> Implement Attention with Linear Biases (ALiBi) for a given set of matrices. | |||
| Given the query matrix <code>Q</code> of size <code>M×d</code>, key matrix <code>K</code> of size <code>N×d</code>, and value matrix | |||
| <code>V</code> of size <code>N×d</code>, your program should compute the output matrix using the formula: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to link the paper in the spec
| Given the query matrix <code>Q</code> of size <code>M×d</code>, key matrix <code>K</code> of size <code>N×d</code>, and value matrix | ||
| <code>V</code> of size <code>N×d</code>, your program should compute the output matrix using the formula: | ||
| $$\text{Attention}_{ALiBi}(Q, K, V) = \text{softmax}\Bigl( \frac{QK^T}{\sqrt{d}} + \alpha \cdot \Delta \Bigr)V$$ | ||
| </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
52 is taken, do 55
| </p> | ||
|
|
||
| <p> | ||
| where α is a slope controlling the linear bias and <code>Δ = i - j</code> represents the relative position between query <code>i</code> and key <code>j</code>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the dir name from alibi to attn_w_linear_bias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the new commit to include paper and new dir name
New Question: Attention with Linear Biases
Question is from paper: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (https://arxiv.org/pdf/2108.12409)