optimizer: allow EA-powered `finalizer` inlining #55954

aviatesk · 2024-10-01T12:45:52Z

E.g. this allows finalizer inlining in the following case:

mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end

julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing

However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as broken. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas:

EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls.
Relaxing the :nothrow check for finalizer inlining: the current algorithm requires :nothrow-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when :nothrow cannot be guaranteed, auxiliary optimizations such as inserting a finalize call after the last use might still be possible.

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-10-01T13:44:22Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

vtjnash · 2024-10-01T14:28:49Z

The call foreign_buffer_finalized[] = true is a data race if that global is observed anywhere (or if the foreign_alloc is called twice), and so this finalizer function is probably UB. I don't know how I feel about having that as a test case.

aviatesk · 2024-10-01T15:33:45Z

Would using atomic operations help avoid those data races? If this kind of pattern is dangerous from its nature, I’m thinking of limiting the tests to just IR checks and removing the ones that involve execution.

aviatesk · 2024-10-02T07:44:32Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-10-02T08:43:35Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2024-10-02T10:19:25Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-10-02T11:18:17Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2024-10-04T10:14:48Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-10-04T11:13:44Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2024-10-11T07:40:55Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-10-11T08:39:44Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2024-10-12T05:51:45Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2024-10-12T06:51:02Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

aviatesk · 2024-10-16T12:12:37Z

Since this PR doesn't cause any latency issues at this state, I'll go ahead and merge it as is. That said, the cases optimized by this PR are still limited, and to handle more general cases, we'll need to push forward with further extensions like #55990.

aviatesk mentioned this pull request Oct 1, 2024

Explore early finalization JuliaGPU/CUDA.jl#2443

Open

aviatesk force-pushed the avi/EA-finalizer-2 branch from 8a5fd27 to f9b6863 Compare October 2, 2024 07:44

aviatesk force-pushed the avi/EA-finalizer-2 branch from d4a4c5e to 592ef2a Compare October 2, 2024 09:31

aviatesk force-pushed the avi/EA-finalizer-2 branch 3 times, most recently from fd959c7 to fbdba63 Compare October 4, 2024 10:13

aviatesk force-pushed the avi/EA-finalizer-2 branch 5 times, most recently from 7fd3ba9 to f188487 Compare October 11, 2024 07:40

aviatesk force-pushed the avi/EA-finalizer-2 branch from f188487 to ac2ac7a Compare October 12, 2024 05:51

aviatesk force-pushed the avi/EA-finalizer-2 branch from ac2ac7a to 0d57d35 Compare October 15, 2024 06:13

aviatesk added 3 commits October 16, 2024 14:35

fix EA regression

2a6ac0f

better EAUtils.jl

e50b947

aviatesk force-pushed the avi/EA-finalizer-2 branch from 0d57d35 to e50b947 Compare October 16, 2024 05:35

aviatesk merged commit b19a7c1 into master Oct 16, 2024
7 checks passed

aviatesk deleted the avi/EA-finalizer-2 branch October 16, 2024 12:13

Uh oh!

optimizer: allow EA-powered finalizer inlining #55954

optimizer: allow EA-powered finalizer inlining #55954

Uh oh!

Conversation

aviatesk commented Oct 1, 2024

Uh oh!

nanosoldier commented Oct 1, 2024

Uh oh!

vtjnash commented Oct 1, 2024

Uh oh!

aviatesk commented Oct 1, 2024

Uh oh!

aviatesk commented Oct 2, 2024

Uh oh!

nanosoldier commented Oct 2, 2024

Uh oh!

aviatesk commented Oct 2, 2024

Uh oh!

nanosoldier commented Oct 2, 2024

Uh oh!

aviatesk commented Oct 4, 2024

Uh oh!

nanosoldier commented Oct 4, 2024

Uh oh!

aviatesk commented Oct 11, 2024

Uh oh!

nanosoldier commented Oct 11, 2024

Uh oh!

aviatesk commented Oct 12, 2024

Uh oh!

nanosoldier commented Oct 12, 2024

Uh oh!

aviatesk commented Oct 16, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

optimizer: allow EA-powered `finalizer` inlining #55954

optimizer: allow EA-powered `finalizer` inlining #55954