Skip to content

Conversation

@June24-Wu
Copy link
Contributor

@June24-Wu June24-Wu commented Sep 7, 2025

New Question: Attention with Linear Biases

Question is from paper: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (https://arxiv.org/pdf/2108.12409)

@@ -0,0 +1,107 @@
<p> Implement Attention with Linear Biases (ALiBi) for a given set of matrices.
Given the query matrix <code>Q</code> of size <code>M×d</code>, key matrix <code>K</code> of size <code>N×d</code>, and value matrix
<code>V</code> of size <code>N×d</code>, your program should compute the output matrix using the formula:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to link the paper in the spec

Given the query matrix <code>Q</code> of size <code>M×d</code>, key matrix <code>K</code> of size <code>N×d</code>, and value matrix
<code>V</code> of size <code>N×d</code>, your program should compute the output matrix using the formula:
$$\text{Attention}_{ALiBi}(Q, K, V) = \text{softmax}\Bigl( \frac{QK^T}{\sqrt{d}} + \alpha \cdot \Delta \Bigr)V$$
</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

52 is taken, do 55

</p>

<p>
where &alpha; is a slope controlling the linear bias and <code>&Delta; = i - j</code> represents the relative position between query <code>i</code> and key <code>j</code>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the dir name from alibi to attn_w_linear_bias

Copy link
Contributor Author

@June24-Wu June24-Wu Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the new commit to include paper and new dir name

@June24-Wu June24-Wu changed the title [Feat] New Question: Attention with Linear Biases [New Question] Attention with Linear Biases (Medium) Sep 13, 2025
@kunal-mansukhani kunal-mansukhani merged commit 4526e64 into AlphaGPU:main Sep 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants