-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
optimizer: allow EA-powered finalizer inlining
#55954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
|
The call |
|
Would using atomic operations help avoid those data races? If this kind of pattern is dangerous from its nature, I’m thinking of limiting the tests to just IR checks and removing the ones that involve execution. |
8a5fd27 to
f9b6863
Compare
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
d4a4c5e to
592ef2a
Compare
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
fd959c7 to
fbdba63
Compare
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
7fd3ba9 to
f188487
Compare
|
@nanosoldier |
|
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
f188487 to
ac2ac7a
Compare
|
@nanosoldier |
|
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
ac2ac7a to
0d57d35
Compare
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
ptr = Libc.malloc(sizeof(T) * length)
ptr = Base.unsafe_convert(Ptr{T}, ptr)
obj = ForeignBuffer{T}(ptr)
return finalizer(obj) do obj
Base.@assume_effects :notaskstate :nothrow
foreign_buffer_finalized[] = true
Libc.free(obj.ptr)
end
end
function f_EA_finalizer(N::Int)
workspace = foreign_alloc(Float64, N)
GC.@preserve workspace begin
(;ptr) = workspace
Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1 = Base.mul_int(8, N)::Int64
│ %2 = Core.lshr_int(%1, 63)::Int64
│ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8
│ %4 = Core.eq_int(%3, 0x01)::Bool
└─── goto #3 if not %4
2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└─── unreachable
3 ── goto #4
4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64
└─── goto #5
5 ── goto #6
6 ── goto #7
7 ── goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└─── goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└─── goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│ %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│ $(Expr(:gc_preserve_end, :(%19)))
│ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│ Base.setfield!(%23, :x, true)::Bool
│ %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└─── return nothing
) => Nothing
```
However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.
Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
for limited frames because of latency reasons, which significantly
reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
algorithm requires `:nothrow`-ness on all paths from the allocation of
the mutable struct to its last use, which is not practical for
real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
optimizations such as inserting a `finalize` call after the last use
might still be possible.
0d57d35 to
e50b947
Compare
|
Since this PR doesn't cause any latency issues at this state, I'll go ahead and merge it as is. That said, the cases optimized by this PR are still limited, and to handle more general cases, we'll need to push forward with further extensions like #55990. |
E.g. this allows
finalizerinlining in the following case:However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as
broken. I also need to check its impact on compiler performance.Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas:
:nothrowcheck for finalizer inlining: the current algorithm requires:nothrow-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when:nothrowcannot be guaranteed, auxiliary optimizations such as inserting afinalizecall after the last use might still be possible.@nanosoldier
runbenchmarks("inference", vs=":master")