Skip to content

Conversation

tpapp
Copy link
Owner

@tpapp tpapp commented Sep 30, 2025

Deprecate the ADgradient(::Symbol, ...) and ADgradient(::Val, ...) API. Backends should be defined using ADTypes.jl. This is a breaking change and will require a major version bump once deprecations are removed. However, I think it leads to a much cleaner API.

This PR is purposefully kept minimal and does not change the implementation unless necessary (see below). That is left for future work which this PR should make much easier, the plan is to use DifferentiationInterface.jl for what we can, and only do the workarounds when absolutely necessary.

However, the test framework was also cleaned up since the unified API simplifies testing. Various backends were not properly tested type stability and accepting generic vector types, and had bugs, these are now fixed.

The shadow kwarg from the Enzyme backend is removed since ADTypes does not support it. However, I don't think it was widely used, frankly, I am not even sure why it was there. The Enzyme / forward backend requires type conversions & assertions for stability, this should be investigated, but I think this PR does improve things.

I used this opportunity to deprecate the benchmarking code for ForwardDiff too, it was badly written and I don't think it belongs in this package. It should be removed.

TODO

@tpapp tpapp requested a review from devmotion September 30, 2025 15:10
@tpapp
Copy link
Owner Author

tpapp commented Sep 30, 2025

@gdalle: this is attacking #26/#29 from another angle; once the API is transitioned I plan to remove special casing backends wherever possible and rely on DI. Your input/review would be appreciated.

@gdalle
Copy link
Contributor

gdalle commented Sep 30, 2025

I can take a look but I'm still unclear as to where custom backend bindings would be absolutely necessary? The fact that Turing now uses DI directly seems to suggest that it does everything needed for PPLs?

@tpapp
Copy link
Owner Author

tpapp commented Sep 30, 2025

I just asked @wsmoses and he says we should keep them. I guess he knows best, this is beyond my understanding of Enzyme.jl.

@gdalle
Copy link
Contributor

gdalle commented Sep 30, 2025

Sounds good. Feel free to take inspiration from DI if you need help on the more subtle aspects of ADTypes, like function annotations for Enzyme

- better label printing for tests
- fix inference for some cases
- disable inference checks for no prep
@gdalle
Copy link
Contributor

gdalle commented Sep 30, 2025

I'll take a look at this PR after JuliaCon Paris

@tpapp
Copy link
Owner Author

tpapp commented Sep 30, 2025

@gdalle: take your time, it is not urgent. I added tests for Mooncake in cffc335, inference fails, but using Base.Fix1 helps, though not in every case. I am wondering if this is a known issue but could not find it. See the tests in the commit.

@wsmoses
Copy link
Contributor

wsmoses commented Sep 30, 2025

the shadows is required for stability of the forward mode usage of Enzyme [and will improve performance by not re-generating one-hot vectors]

@tpapp
Copy link
Owner Author

tpapp commented Oct 1, 2025

@wsmoses: thanks for explaining it.

My understanding is that DifferentiationInterface already handles shadows when preparation is used, so simply using that package would work. @gdalle, can you confirm this?

@gdalle
Copy link
Contributor

gdalle commented Oct 1, 2025

That statement about DI is true. Whether you want to call Enzyme directly is a different story and up to you, Billy knows best what's good for his package. Still I'm always happy to fix DI issues if buggy or slow examples are provided, as usual.

@tpapp
Copy link
Owner Author

tpapp commented Oct 1, 2025

I think that once we are breaking the API, ADgradient should always just prepare. See #51.

Copy link
Contributor

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you kept the Enzyme extension, but is there a specific reason to keep the other backend extensions if you want to switch to DI? What are the things you wish you could do with DI but don't seem to be able to do?

[extensions]
LogDensityProblemsADADTypesExt = "ADTypes"
LogDensityProblemsADDifferentiationInterfaceExt = ["ADTypes", "DifferentiationInterface"]
LogDensityProblemsADDifferentiationInterfaceExt = ["DifferentiationInterface"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DI is a very lightweight package (it depends on nothing except ADTypes and LinearAlgebra), I'm not sure it's worth putting it in an extension

# active argument must come first in DI
return LogDensityProblemsAD.logdensity(ℓ, x)
end
@inline _logdensity_callable(ℓ) = Base.Fix1(LogDensityProblems.logdensity, ℓ)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason behind logdensity_switched was to avoid use of Base.Fix1, which is a performance pitfall for Enzyme. The Constant annotation on \ell could also help speed things up for Mooncake in the future, so I'd suggest leaving this as it was

function logdensity_and_gradient(∇ℓ::EnzymeGradientLogDensity{<:Any,<:Enzyme.ForwardMode},
x::AbstractVector)
(; ℓ, mode, shadow) = ∇ℓ
_shadow = shadow === nothing ? Enzyme.onehot(x) : shadow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said by Billy, this is actually useful to store ahead of time

x::AbstractVector{T}) where T
(; ℓ, mode) = ∇ℓ
result = Enzyme.gradient(mode, Base.Fix1(logdensity, ℓ), x)
T(result.val)::T, collect(T, only(result.derivs))::Vector{T}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why convert everything to Vector?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants