Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 19, 2025

The current gelu_tanh implementation uses sigmoid_fast, which prevents Reactant.jl from pattern matching and fusing GELU operations into GEMM calls (see EnzymeAD/Reactant.jl#1420).

Changes

  • Reimplemented gelu_tanh to use the standard paper formula with tanh_fast:

    x/2 * (1 + tanh_fast((2/π) * (x + 0.044715x³)))

    This enables compiler pattern matching while maintaining mathematical correctness.

  • Created gelu_sigmoid preserving the old sigmoid-based implementation for users who prefer it:

    x * sigmoid_fast((8/π) * x * (1 + 0.044715x²))
  • Updated derivatives for both variants with correct chain rule application

  • Maintained backward compatibility: gelu constant still points to gelu_tanh

Both implementations are mathematically equivalent (verified to machine precision) and produce identical outputs.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • pkg.julialang.org
    • Triggering command: julia -e using Pkg; Pkg.instantiate() (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>gelu_tanh should actually use tanh</issue_title>
<issue_description>### Motivation and description

Currently gelu_tanh uses sigmoid which prevents us from pattern matching and fusing the gelu into gemm calls for dense layers. See EnzymeAD/Reactant.jl#1420 for details. cc @wsmoses

Possible Implementation

Rename the current gelu_tanh to gelu_sigmoid. Re-implement gelu_tanh to follow the original paper implementation</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@CarloLucibello CarloLucibello marked this pull request as ready for review November 19, 2025 09:13
Comment on lines +352 to +360
α = oftf(x, 0.044715)
α2 = oftf(x, 0.08943)
λ = oftf(x, gelu_λ)
x2 = x * x
t = muladd(x2, α, one(x))
z = λ * x * t
Ω = tanh_fast(z)
sech2 = 1 - Ω^2
(1 + Ω)/2 + x * λ * muladd(x2, α2, t) * sech2 / 2
Copy link
Member

@CarloLucibello CarloLucibello Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot explain this derivation

Copilot AI changed the title [WIP] Fix gelu_tanh implementation to use tanh instead of sigmoid Refactor gelu_tanh to use tanh for compiler pattern matching Nov 19, 2025
Copilot AI requested a review from CarloLucibello November 19, 2025 09:18
Copilot finished work on behalf of CarloLucibello November 19, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gelu_tanh should actually use tanh

2 participants