Skip to content

Conversation

@xal-0
Copy link
Member

@xal-0 xal-0 commented Aug 19, 2025

We already save some memory here by deleting the jl_native_code_desc_t after
we're done serializing the combined module, but some data in the module's
LLVM::Context lives on until the end of the scope in jl_dump_native_impl.
Once we're done with the module, move the lock and ThreadSafeContext so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3 GiB is
in use. Deleting the jl_native_code_desc_t (as before) saves about 600 MiB,
and cleaning up the context saves an additional ~500 MiB.

@xal-0 xal-0 added performance Must go faster compiler:llvm For issues that relate to LLVM labels Aug 19, 2025
@gbaraldi gbaraldi added backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 labels Aug 19, 2025
Comment on lines +2274 to +2281
data_outputs = compile(*dataM, "text", threads, [data, &lock, &TSCtx](Module &) {
// Delete data when add_output thinks it's done with it
// Saves memory for use when multithreading
auto lock2 = std::move(lock);
delete data;
// Drop last reference to shared LLVM::Context
auto TSCtx2 = std::move(TSCtx);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to use modern C++ features for this (though it might require replacing std::function with llvm::unique_function, which hasn't yet been standardized, so that it compiles to do the right thing)

Suggested change
data_outputs = compile(*dataM, "text", threads, [data, &lock, &TSCtx](Module &) {
// Delete data when add_output thinks it's done with it
// Saves memory for use when multithreading
auto lock2 = std::move(lock);
delete data;
// Drop last reference to shared LLVM::Context
auto TSCtx2 = std::move(TSCtx);
});
data_outputs = compile(*dataM, "text", threads, [data, lock = std::move(lock), TSCtx = std::move(TSCtx)](Module &) {
// Delete data when add_output thinks it's done with it
// Saves memory for use when multithreading
delete data;
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to explicitly invoke the destructor for the lambda: godbolt link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is the design mistake llvm::unique_function may fix (over std::function)

This was referenced Aug 19, 2025
@xal-0 xal-0 merged commit ceeb661 into JuliaLang:master Aug 22, 2025
12 checks passed
KristofferC pushed a commit that referenced this pull request Sep 1, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
KristofferC pushed a commit that referenced this pull request Sep 2, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
DilumAluthge added a commit that referenced this pull request Sep 5, 2025
Backported PRs:
- [x] #54840 <!-- Add boundscheck in speccache_eq to avoid OOB access
due to data race -->
- [x] #42080 <!-- recommend explicit `using Foo: Foo, ...` in package
code (was: "using considered harmful") -->
- [x] #58127 <!-- [DOC] Update installation docs: /downloads/ =>
/install/ -->
- [x] #58202 <!-- [release-1.11] malloc: use jl_get_current_task to fix
null check -->
- [x] #58584 <!-- Make `Ptr` values static-show w/ type-information -->
- [x] #58637 <!-- Make late gc lower handle insertelement of alloca use.
-->
- [x] #58837 <!-- fix null comparisons for non-standard address spaces
-->
- [x] #57826 <!-- Add a `similar` method for `Type{<:CodeUnits}` -->
- [x] #58293 <!-- fix trailing indices stackoverflow in reinterpreted
array -->
- [x] #58887 <!-- Pkg: Allow configuring can_fancyprint(io::IO) using
IOContext -->
- [x] #58937 <!-- Fix nthreadpools size in JLOptions -->
- [x] #58978 <!-- Fix precompilepkgs warn loaded setting -->
- [x] #58998 <!-- Bugfix: Use Base.aligned_sizeof instead of sizeof in
Mmap.mmap -->
- [x] #59120 <!-- Fix memory order typo in "src/julia_atomics.h" -->
- [x] #59170 <!-- Clarify and enhance confusing precompile test -->

Need manual backport:
- [ ] #56329 <!-- loading: clean up more concurrency issues -->
- [ ] #56956 <!-- Add "mea culpa" to foreign module assignment error.
-->
- [ ] #57035 <!-- linux: workaround to avoid deadlock inside
dl_iterate_phdr in glibc -->
- [ ] #57089 <!-- Block thread from receiving profile signal with
stackwalk lock -->
- [ ] #57249 <!-- restore non-freebsd-unix fix for profiling -->
- [ ] #58011 <!-- Remove try-finally scope from `@time_imports`
`@trace_compile` `@trace_dispatch` -->
- [ ] #58062 <!-- remove unnecessary edge from `exp_impl` to `pow` -->
- [ ] #58157 <!-- add showing a string to REPL precompile workload -->
- [ ] #58209 <!-- Specialize `one` for the `SizedArray` test helper -->
- [ ] #58108 <!-- Base.get_extension & Dates.format made public -->
- [ ] #58356 <!-- codegen: remove readonly from abstract type calling
convention -->
- [ ] #58415 <!-- [REPL] more reliable extension loading -->
- [ ] #58510 <!-- Don't filter `Core` methods from newly-inferred list
-->
- [ ] #58110 <!-- relax dispatch for the `IteratorSize` method for
`Generator` -->
- [ ] #58965 <!-- Fix `hygienic-scope`s in inner macro expansions -->
- [ ] #58971 <!-- Fix alignment of failed precompile jobs on CI -->
- [ ] #59066 <!-- build: Also pass -fno-strict-aliasing for C++ -->

Contains multiple commits, manual intervention needed:
- [ ] #55877 <!-- fix FileWatching designs and add workaround for a stat
bug on Apple -->
- [ ] #56755 <!-- docs: fix scope type of a `struct` to hard -->
- [ ] #57809 <!-- Fix fptrunc Float64 -> Float16 rounding through
Float32 -->
- [ ] #57398 <!-- Make remaining float intrinsics require float
arguments -->
- [ ] #56351 <!-- Fix `--project=@script` when outside script directory
-->
- [ ] #57129 <!-- clarify that time_ns is monotonic -->
- [ ] #58134 <!-- Note annotated string API is experimental in Julia
1.11 in HISTORY.md -->
- [ ] #58401 <!-- check that hashing of types does not foreigncall
(`jl_type_hash` is concrete evaluated) -->
- [ ] #58435 <!-- Fix layout flags for types that have oddly sized
primitive type fields -->
- [ ] #58483 <!-- Fix tbaa usage when storing into heap allocated
immutable structs -->
- [ ] #58512 <!-- Make more types jl_static_show readably -->
- [ ] #58012 <!-- Re-enable tab completion of kwargs for large method
tables -->
- [ ] #58683 <!-- Add 0 predecessor to entry basic block and handle it
in inlining -->
- [ ] #59112 <!-- Add builtin function name to add methods error -->

Non-merged PRs with backport label:
- [ ] #59329 <!-- aotcompile: destroy LLVM context after serializing
combined module -->
- [ ] #58848 <!-- Set array size only when safe to do so -->
- [ ] #58535 <!-- gf.c: include const-return methods in
`--trace-compile` -->
- [ ] #58038 <!-- strings/cstring: `transcode`: prevent Windows sysimage
invalidation -->
- [ ] #57604 <!-- `@nospecialize` for `string_index_err` -->
- [ ] #57366 <!-- Use ptrdiff_t sized offsets for gvars_offsets to allow
large sysimages -->
- [ ] #56890 <!-- Enable getting non-boxed LLVM type from Julia Type -->
- [ ] #56823 <!-- Make version of opaque closure constructor in world
-->
- [ ] #55958 <!-- also redirect JL_STDERR etc. when redirecting to
devnull -->
- [ ] #55956 <!-- Make threadcall gc safe -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->

---------

Co-authored-by: Kiran Pamnany <[email protected]>
Co-authored-by: adienes <[email protected]>
Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Keno Fischer <[email protected]>
Co-authored-by: Simeon David Schaub <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Fons van der Plas <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
Co-authored-by: JonasIsensee <[email protected]>
Co-authored-by: Curtis Vogt <[email protected]>
Co-authored-by: Dilum Aluthge <[email protected]>
Co-authored-by: DilumAluthgeBot <[email protected]>
Co-authored-by: DilumAluthge <[email protected]>
DilumAluthge pushed a commit that referenced this pull request Sep 9, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
@DilumAluthge DilumAluthge mentioned this pull request Sep 9, 2025
59 tasks
@DilumAluthge DilumAluthge removed the backport 1.11 Change should be backported to release-1.11 label Sep 9, 2025
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Sep 11, 2025
@DilumAluthge DilumAluthge added the backport 1.11 Change should be backported to release-1.11 label Sep 11, 2025
KristofferC pushed a commit that referenced this pull request Sep 15, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
DilumAluthge added a commit that referenced this pull request Nov 3, 2025
Backported PRs:
- [x] #57523 <!-- Remove usages of weak symbols -->
- [x] #58127 <!-- [DOC] Update installation docs: /downloads/ =>
/install/ -->
- [x] #58202 <!-- [release-1.11] malloc: use jl_get_current_task to fix
null check -->
- [x] #58554 <!-- remove workaround for controlling terminal behavior in
repl_cmd -->
- [x] #58584 <!-- Make `Ptr` values static-show w/ type-information -->
- [x] #59062 <!-- remove a testset from MMAP that might cause CI to now
fail on Windows -->
- [x] #59300 <!-- Update the developer docs to reflect the use of
JuliaSyntax.jl -->
- [x] #57604 <!-- `@nospecialize` for `string_index_err` -->
- [x] #59329 <!-- aotcompile: destroy LLVM context after serializing
combined module -->
- [x] #59418 <!-- Fix startup when history file is bad -->
- [x] #56890 <!-- Enable getting non-boxed LLVM type from Julia Type -->
- [x] #59559 <!-- codegen: mark write barrier field load as volatile -->
- [x] #59572 <!-- Only apply Base.Sort.SubArrayOptimization when
iszero(v.offset1) -->

Need manual backport:
- [ ] #56329 <!-- loading: clean up more concurrency issues -->
- [ ] #56956 <!-- Add "mea culpa" to foreign module assignment error.
-->
- [ ] #57035 <!-- linux: workaround to avoid deadlock inside
dl_iterate_phdr in glibc -->
- [ ] #57089 <!-- Block thread from receiving profile signal with
stackwalk lock -->
- [ ] #57249 <!-- restore non-freebsd-unix fix for profiling -->
- [ ] #58011 <!-- Remove try-finally scope from `@time_imports`
`@trace_compile` `@trace_dispatch` -->
- [ ] #58062 <!-- remove unnecessary edge from `exp_impl` to `pow` -->
- [ ] #58157 <!-- add showing a string to REPL precompile workload -->
- [ ] #58209 <!-- Specialize `one` for the `SizedArray` test helper -->
- [ ] #58356 <!-- codegen: remove readonly from abstract type calling
convention -->
- [ ] #58415 <!-- [REPL] more reliable extension loading -->
- [ ] #58510 <!-- Don't filter `Core` methods from newly-inferred list
-->
- [ ] #58110 <!-- relax dispatch for the `IteratorSize` method for
`Generator` -->
- [ ] #58965 <!-- Fix `hygienic-scope`s in inner macro expansions -->
- [ ] #58971 <!-- Fix alignment of failed precompile jobs on CI -->
- [ ] #59066 <!-- build: Also pass -fno-strict-aliasing for C++ -->
- [ ] #59428 <!-- Correctly set the variant bits of uuid1 -->

Contains multiple commits, manual intervention needed:
- [ ] #55877 <!-- fix FileWatching designs and add workaround for a stat
bug on Apple -->
- [ ] #56755 <!-- docs: fix scope type of a `struct` to hard -->
- [ ] #57809 <!-- Fix fptrunc Float64 -> Float16 rounding through
Float32 -->
- [ ] #57398 <!-- Make remaining float intrinsics require float
arguments -->
- [ ] #56351 <!-- Fix `--project=@script` when outside script directory
-->
- [ ] #57129 <!-- clarify that time_ns is monotonic -->
- [ ] #58134 <!-- Note annotated string API is experimental in Julia
1.11 in HISTORY.md -->
- [ ] #58401 <!-- check that hashing of types does not foreigncall
(`jl_type_hash` is concrete evaluated) -->
- [ ] #58435 <!-- Fix layout flags for types that have oddly sized
primitive type fields -->
- [ ] #58483 <!-- Fix tbaa usage when storing into heap allocated
immutable structs -->
- [ ] #58512 <!-- Make more types jl_static_show readably -->
- [ ] #58012 <!-- Re-enable tab completion of kwargs for large method
tables -->
- [ ] #58683 <!-- Add 0 predecessor to entry basic block and handle it
in inlining -->
- [ ] #59112 <!-- Add builtin function name to add methods error -->
- [ ] #56823 <!-- Make version of opaque closure constructor in world
-->
- [ ] #59467 <!-- `CoreLogging`: prevent some `Annotated*`-related
instability -->

Non-merged PRs with backport label:
- [ ] #59450 <!-- Propagate Addrspaces: fix lift of memset -->
- [ ] #58848 <!-- Set array size only when safe to do so -->
- [ ] #55958 <!-- also redirect JL_STDERR etc. when redirecting to
devnull -->
- [ ] #55956 <!-- Make threadcall gc safe -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->

---------

Co-authored-by: Dilum Aluthge <[email protected]>
Co-authored-by: Sam Schweigel <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Adam Wheeler <[email protected]>
Co-authored-by: Neven Sajko <[email protected]>
Co-authored-by: Shuhei Kadowaki <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
Co-authored-by: William Moses <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Lilith Orion Hafner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.11 Change should be backported to release-1.11 compiler:llvm For issues that relate to LLVM performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants