Skip to content

Conversation

Bromeon
Copy link
Member

@Bromeon Bromeon commented Sep 7, 2025

base() + base_mut() currently clone the Gd<T>, which causes refcount increments in case of T: Inherits<RefCounted>.
Theoretically this shouldn't be necessary, since the guards hold a lifetime tied to the "parent" object (the original Gd).

This PR experiments with ways to avoid this, but is currently blocked on a memory leak that needs further investigation. Not sure if this approach is feasible at all.

@Bromeon Bromeon added c: core Core components performance Performance problems and optimizations labels Sep 7, 2025
@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1302

@@ -493,7 +493,7 @@ pub trait WithBaseField: GodotClass + Bounds<Declarer = bounds::DeclUser> {
/// ```
#[allow(clippy::let_unit_value)]
fn base_mut(&mut self) -> BaseMut<'_, Self> {
let base_gd = self.base_field().__constructed_gd();
let weak_gd = self.base_field().__constructed_gd_weak();

let gd = self.to_gd();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.to_gd still increments ref count.

@beicause
Copy link
Contributor

beicause commented Sep 8, 2025

By printing the instance id, the memory leak occurs at the the script instance test:

...
   -- test_script_instance_re_entering_call ... 115628574536
9223372152500127559
...
Leaked instance: RefCounted:9223372152516904774
Leaked instance: :9223372152500127559 - Resource path:
...

@Yarwin
Copy link
Contributor

Yarwin commented Sep 8, 2025

By printing the instance id, the memory leak occurs at the the script instance test

I can reproduce leak in ScriptInstanceTest.gd without any tweaks 🤔.

The reason is simple though – to_script_gd increases refcount:

/// Returns a [`Gd`] referencing the base object, for use in script contexts only.
pub(crate) fn to_script_gd(&self) -> Gd<T> {
#[cfg(debug_assertions)]
assert_eq!(
self.init_state.get(),
InitState::Script,
"to_script_gd() can only be called on script-context Base objects"
);
(*self.obj).clone()
}

and then it is being dropped weak:

make_base_ref!(
ScriptBaseRef,
ScriptInstance,
SiMut,
crate::obj::script::SiMut,
"[`ScriptInstance`]"
);
make_base_mut!(
ScriptBaseMut,
ScriptInstance,
SiMut,
crate::obj::script::SiMut,
"['ScriptInstance']"
);

Changing it to unsafe { Gd::from_obj_sys_weak(self.obj.obj_sys()) } removes the leak

Images with run before/after change (visualizing both the run and change) image image

@Yarwin
Copy link
Contributor

Yarwin commented Sep 8, 2025

I'm curious why does it create 2 leaks (and orphans) instead of n leaks, but that's probably yet another Godot quirk 🙄

WARNING: ObjectDB instances leaked at exit (run with --verbose for details).
     at: cleanup (core/object/object.cpp:2378)                                                                                                                                                                                      
Leaked instance: RefCounted:9223372085877802864
Leaked instance: :9223372085861025649 - Resource path: 
(...)

@Bromeon Bromeon marked this pull request as ready for review September 8, 2025 21:08
@Bromeon
Copy link
Member Author

Bromeon commented Sep 8, 2025

Thanks a lot to both of you for the debugging help, very nice ❤️

It works now, but it's a bit flimsy. I'm considering to abstract this in a WeakGd -- but unlike #1280, it would be purely internal and have the most minimal API, just to reduce the UAF + leakage minefield a bit.

@Bromeon Bromeon added this to the 0.4 milestone Sep 12, 2025
@Bromeon Bromeon force-pushed the perf/base-no-clone branch 3 times, most recently from ee6373d to d6a0dae Compare September 14, 2025 13:03
@Bromeon Bromeon enabled auto-merge September 14, 2025 13:11
@Bromeon Bromeon added this pull request to the merge queue Sep 14, 2025
Merged via the queue into master with commit 369e9d3 Sep 14, 2025
17 checks passed
@Bromeon Bromeon deleted the perf/base-no-clone branch September 14, 2025 13:18
@beicause
Copy link
Contributor

I believe we still can't use base_mut in init because #1302 (comment) is not addressed. I'm not sure if this is feasible, but solving it may make base_mut work in init and predelete, and make to_init_gd no longer needed.

@Bromeon
Copy link
Member Author

Bromeon commented Sep 14, 2025

I believe we still can't use base_mut in init because #1302 (comment) is not addressed. I'm not sure if this is feasible, but solving it may make base_mut work in init and predelete, and make to_init_gd no longer needed.

Could you maybe provide an #[itest] case that exposes the behavior that currently fails, and would pass if implemented correctly? So we have a clear goal + regression test for the future.

@beicause
Copy link
Contributor

beicause commented Sep 15, 2025

Could you maybe provide an #[itest] case that exposes the behavior that currently fails, and would pass if implemented correctly? So we have a clear goal + regression test for the future.

My proof is here: master...beicause:gdext:base_mut_init_drop It works, but I give BaseMut a None guard during init because the instance binding is null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: core Core components performance Performance problems and optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants