From 94f192b2424873b3e29e9fef6d75501ac7d2a2cc Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 16 Jan 2025 20:04:08 +0800 Subject: [PATCH 01/23] Special topic chapter for finalizer & weakref Added a special topic chapter for how to implement finalizers and weak references with MMTk. --- docs/userguide/src/SUMMARY.md | 2 + .../src/portingguide/topics/prefix.md | 5 + .../src/portingguide/topics/weakref.md | 283 ++++++++++++++++++ 3 files changed, 290 insertions(+) create mode 100644 docs/userguide/src/portingguide/topics/prefix.md create mode 100644 docs/userguide/src/portingguide/topics/weakref.md diff --git a/docs/userguide/src/SUMMARY.md b/docs/userguide/src/SUMMARY.md index 494126cc6b..071f5aa70d 100644 --- a/docs/userguide/src/SUMMARY.md +++ b/docs/userguide/src/SUMMARY.md @@ -36,6 +36,8 @@ - [Performance Tuning](portingguide/perf_tuning/prefix.md) - [Link Time Optimization](portingguide/perf_tuning/lto.md) - [Optimizing Allocation](portingguide/perf_tuning/alloc.md) + - [Special Topics](portingguide/topics/prefix.md) + - [Finalizers and Weak References](portingguide/topics/weakref.md) - [API Migration Guide](migration/prefix.md) - [Template (for mmtk-core developers)](migration/template.md) diff --git a/docs/userguide/src/portingguide/topics/prefix.md b/docs/userguide/src/portingguide/topics/prefix.md new file mode 100644 index 0000000000..493bd551f2 --- /dev/null +++ b/docs/userguide/src/portingguide/topics/prefix.md @@ -0,0 +1,5 @@ +# Special topics + +Every VM is special in some way. Because of this, some VM bindings may use MMTk features not +usually used by most VMs, and may even deviate from the usual steps of integrating MMTk into the VM. +Here we provide special guides to cover such cases. diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md new file mode 100644 index 0000000000..6611935e0e --- /dev/null +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -0,0 +1,283 @@ +# Finalizers and Weak References + +Some VMs support **finalizers**. In simple terms, finalizers are clean-up operations associated +with an object, and are executed when the object is dead. + +Some VMs support **weak references**. If an object cannot be reached from roots following only +strong references, the object will be considered dead. Weak references to dead objects will be +cleared, and associated clean-up operations will be executed. Some VMs also support more complex +weak data structures, such as weak hash tables, where keys, values, or both, can be weak references. + +The concrete semantics of finalizer and weak reference varies from VM to VM, but MMTk provides a +low-level API that allows the VM bindings to implement their flavours of finalizer and weak +references on top of it. + +**A note for Java programmers**: In Java, the term "weak reference" often refers to instances of +`java.lang.ref.Reference` (including the concrete classes `SoftReference`, `WeakReference`, +`PhantomReference` and the hidden `FinalizerReference` class used by some JVM implementations to +implement finalizers). Instances of `Reference` are proper Java heap objects, but each instance has +a field that contains a pointer to the referent, and the field can be cleared when the referent +dies. In this article, we use the term "weak reference" to refer to the pointer inside that field. +In other words, a Java `Reference` instance has a field that holds a weak reference to the referent. + +## Overview + +During each GC, after the transitive closure is computed, MMTk calls `Scanning::process_weak_refs` +which is implemented by the VM binding. Inside this function, the VM binding can do several things. + +- **Query reachability**: The VM binding can query whether any given object has been reached in + the transitive closure. + - **Query forwarded address**: If an object is already reached, the VM binding can further + query the new address of an object. This is needed to support copying GC. + - **Retain object**: If an object is not reached, the VM binding can optionally request to + retain (i.e. "resurrect") the object. It will keep that object *and all descendants* + alive. +- **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be + *called again* after computing the transitive closure that includes *retained objects and their + descendants*. This helps handling multiple levels of weak reference strength. + +Concretely, + +- `ObjectReference::is_reachable()` queries reachability, +- `ObjectReference::get_forwarded_object()` queries forwarded address, and +- the `tracer_context` argument provided by the `Scanning::process_weak_refs` function can retain + objects. +- Returning `true` from `Scanning::process_weak_refs` will make it called again. + +The `Scanning::process_weak_refs` function also gives the VM binding a chance to perform other +operations, including (but not limited to) + +- **Do clean-up operations**: The VM binding can perform clean-up operations, or queue them to be + executed after GC. +- **update fields** that contain weak references. + - **Forward the field**: It can write the forwarded address of the referent if moved by a + copying GC. + - **Clear the field**: It can clear the field if the referent is unreachable. + +Using those primitive operations, the VM binding can support different flavours of finalizers and/or +weak references. We will discuss different use cases in the following sections. + +## Supporting finalizers + +Different VMs define "finalizer" differently, but they all involve performing operations when an +object is dead. The general way to handle finalizer is visiting all **finalizable objects** (i.e. +objects that have associated finalization operations), check if they are dead and, if dead, do +something about them. + +### Identifying finalizable objects + +Some VMs determine whether an object is finalizable by its type. In Java, for example, an object is +finalizable if its `finalize()` method is overridden. We can register instances of such types when +they are constructed. + +Some VMs can attach finalizing operations to an object after it is created. The VM can maintain a +list of objects with attached finalizers, or maintain a (weak) hash map that maps finalizable +objects to its associated finalizers. + +### When to run finalizers? + +Depending on the semantics, finalizers can be executed during GC or during mutator time after GC. + +The VM binding can run finalizers in `Scanning::process_weak_refs` after finding a finalizable +object dead. But beware that MMTk is usually run with multiple GC workers. The VM binding can +parallelise the operations by creating work packets. The `Scanning::process_weak_refs` function is +executed in the `VMRefClosure` stage, so the created work packets shall be added to the same bucket. + +If the finalizers should be executed after GC, the VM binding should enqueue them to VM-specific +queues so that they can be picked up after GC. + +### Reading the body of dead object + +In some VMs, finalizers can read the fields in dead objects. Such fields usually include +information needed for cleaning up resources held by the object, such as file descriptors and +pointers to memory or objects not managed by GC. + +`Scanning::process_weak_refs` is executed in the `VMRefClosure` stage, which happens after the +strong transitive closure (including all objects reachable from roots following only strong +references) has been computed, but before any object has been released (which happens in the +`Release` stage). This means the body of all objects, live or dead, can still be accessed during +this stage. + +Therefore, if the VM needs to execute finalizers during GC, the VM binding can execute them in +`process_weak_refs`, or create work packets in the `VMRefClosure` stage. + +However, if the VM needs to execute finalizers after GC, there will be a problem because the object +will be reclaimed, and memory of the object will be overwritten by other objects. In this case, the +VM will need to "resurrect" the dead object. + +### Resurrecting dead objects + +Some VMs, particularly the Java VM, executes finalizers during mutator time. The dead finalizable +objects must be brought back to life so that they can still be accessed after the GC. + +The `Scanning::process_weak_refs` has an parameter `tracer_context: impl ObjectTracerContext`. +This parameter provides the necessary mechanism to retain (i.e. "resurrect") objects and make them +(and their descendants) live through the current GC. The typical use pattern is: + +```rust +impl Scanning for VMScanning { + fn process_weak_refs( + worker: &mut GCWorker, + tracer_context: impl ObjectTracerContext, + ) -> bool { + let finalizable_objects = ...; + let mut new_finalizable_objects = vec![]; + + tracer_context.with_tracer(worker, |tracer| { + for object in finalizable_objects { + if object.is_reachable() { + // Object is still alive, and may be moved if it's copying GC. + let new_object = object.get_forwarded_object().unwrap_or(object); + new_finalizable_objects.push(new_object); + } else { + // Object is dead. Retain it. + let new_object = tracer.trace_object(object); + enqueue_finalizable_object_to_be_executed_later(new_object); + } + } + }); + + // more code ... + } +} +``` + +The `tracer` parameter of the closure is an `ObjectTracer`. It provides the `trace_object` method +which retains an object and returns the forwarded address. + +`tracer_context.with_tracer` creates a temporary `ObjectTracer` instance which the VM binding can +use within the given closure. Objects retained by `trace_object` in the closure are enqueued. +After the closure returns, `with_tracer` will create reasonably-sized work packets for tracing the +retained objects and their descendants. Therefore, the VM binding is encouraged use one +`with_tracer` invocation to retain as many objects as needed. Do not call `with_tracer` too often, +or it will create too many small work packets, which hurts the performance. + +Keep in mind that **`ObjectTracerContext` implements `Clone`**. If the VM has too many finalizable +objects, it is advisable to split the list of finalizable objects into smaller chunks. Create one +work packets for each chunk, and give each work packet a clone of `tracer_context` so that multiple +work packets can process finalizable objects in parallel. + + +## Supporting weak references + +The general way to handle weak references is, after computing the transitive closure, iterate +through all fields that contain weak references to objects. For each field, + +- if the referent is already reached, write the new address of the object to the field (or do + nothing if the object is not moved); +- otherwise, clear the field, writing `null`, `nil`, or whatever represents a cleared weak + reference to the field. + +### Identifying weak references + +Weak references in global slots, including fields of global data structures as well as keys and/or +values in global weak tables, are relatively straightforward. We just need to enumerate them in +`Scanning::process_weak_refs`. + +There are also fields that in heap objects that hold weak references to other heap objects. There +are two basic ways to identify them. + +- **Register on creation**: We may record objects that contain such fields in a global list when + such objects are created. In `Scanning::process_weak_refs`, we just need to iterate through + this list, process the fields, and remove dead objects from the list. +- **Discover objects during tracing**: While computing the transitive closure, we scan objects and + discover objects that contain weak reference fields. We enqueue such objects into a list, and + iterate through the list in `Scanning::process_weak_refs` after transitive closure. The list + needs to be reconstructed in each GC. + +Both methods work, but each has its advantages and disadvantages. Registering on creation does not +need to reconstruct the list in every GC, while discovering during tracing can avoid visiting dead +objects. Depending on the nature of your VM, one method may be easier to implement than the other, +especially if your VM's existing GC has already implemented weak reference processing in some way. + +### Associated clean-up operations + +Some languages and VMs allow certain clean-up operations to be associated with weak references, and +will be executed after the weak reference is cleared. + +Such clean-up operations can be supported similar to finalizers. While we enumerate weak references +in `Scanning::process_weak_refs`, we clear weak references to unreachable objects. Depending on the +semantics, such as whether the clean-up operation can access the body of unreachable referent, we +may choose to execute the clean-up operation immediately, or enqueue them to be executed after GC, +and may even resurrect the unreachable referent if we need to. + +### Soft references + +Java has a special kind of weak reference: `SoftReference`. The API allows the GC to choose whether +to retain or clear references to softly reachable objects. When using MMTk, there are two ways to +implement it. + +The easiest way is **treating `SoftReference` as strong references in non-emergency GCs, and +treating them as weak references in emergency GCs**. During non-emergency GC, we let +`Scanning::scan_objects` scan the weak reference field inside a `SoftReference` instance as if it +were an ordinary strong reference field. In this way, the (strong) transitive closure after the +`Closure` stage will also include softly reachable objects, and they will be retained. During +emergency GC, however, skip this field in `Scanning::scan_objects`, and clear `SoftReference` just +like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly reachable objects will +be dead if not subject to finalization. + +The other way is **retaining `SoftReference` after the strong closure**. This involves supporting +multiple levels of reference strengths, which will be introduced in the next section. + +### Multiple levels of reference strength + +Some VMs support multiple levels of weak reference strengths. Java, for example, has +`SoftReference`, `WeakReference`, `FinalizerReference` (internal) and `PhantomReference`, in the +order of decreasing strength. + +This can be supported by running `Scanning::process_weak_refs` multiple times. If +`process_weak_refs` returns `true`, it will be called again after all pending work packets in the +`VMRefClosure` stage has been executed. That include all work packets that compute the transitive +closure from objects retained (i.e. "resurrected") during `process_weak_refs`. This allows the VM +binding to expand the transitive closure multiple times, each retaining objects at different levels +of reachability. + +Take Java as an example, we may run `process_weak_refs` four times. + +1. Visit all `SoftReference`. + - If the referent is reachable, then + - forward the referent field. + - If the referent is unreachable, choose between one of the following: + - Retain the referent and update the referent field. + - Clear the referent field, remove the `SoftReference` from the list of soft references, + and optionally enqueue it to the associated `ReferenceQueue` if it has one. + - (This step may expand the transitive closure if any referents are retained.) +2. Visit all `WeakReference`. + - If the referent is reachable, then + - forward the referent field. + - If the referent is unreachable, then + - clear the referent field, remove the `WeakReference` from the list of weak references, + and optionally enqueue it to the associated `ReferenceQueue` if it has one. + - (This step cannot expand the transitive closure.) +3. Visit the list of finalizable objects (may be implemented as `FinalizerReference` by some JVMs). + - If the finalizable object is reachable, then + - forward the reference to it since it may have been moved. + - If the finalizable object is unreachable, then + - remove it from the list of finalizable objects, and enqueue it for finalization. + - (This step may expand the transitive closure if any finalizable objects are retained.) +4. Visit all `PhantomReference`. + - If the referent is reachable, then + - forward the referent field. (Note: `PhantomReference#get()` always returns `null`, but + the actual referent field shall hold a valid reference to the referent.) + - If the referent is unreachable, then + - clear the referent field, remove the `PhantomReference` from the list of phantom + references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. + - (This step cannot expand the transitive closure.) + +As an optimization, Step 1 can be eliminated by merging it with the strong closure in non-emergency +GC, or with `WeakReference` processing in emergency GC, as we described in the previous section. +Step 2 can be merged with Step 3 since Step 2 never expands the transitive closure. Therefore, we +only need to run `process_weak_refs` twice: + +1. Handle `WeakReference` (and also `SoftReference` in emergency GC), and then handle finalizable + objects. +2. Handle `PhandomReference`. + +### Ephemerons + +TODO + + + From be789b22a18d6fc50cd3264256a7d4e2ac2c3731 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Tue, 21 Jan 2025 19:50:22 +0800 Subject: [PATCH 02/23] State machine and deprecated things --- .../src/portingguide/topics/weakref.md | 47 +++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 6611935e0e..33ccd8707b 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -273,11 +273,58 @@ only need to run `process_weak_refs` twice: objects. 2. Handle `PhandomReference`. +To implement this, the VM binding may need to implement some kind of *state machine* so that the +`Scanning::process_weak_refs` function behaves differently each time it is called. For example, + +```rust +fn process_weak_ref(...) -> bool { + let mut state = /* Get VM-specific states here. */; + + match *state { + State::ProcessSoftReference => { + process_soft_references(...); + *state = State::ProcessWeakReference; + return true; // Run this function again. + } + State::ProcessWeakReference => { + process_weak_references(...); + *state = State::ProcessFinalizableObjects; + return true; // Run this function again. + } + State::ProcessFinalizableObjects => { + process_finalizable_objects(...); + *state = State::ProcessPhantomReferences; + return true; // Run this function again. + } + State::ProcessPhantomReferences => { + process_phantom_references(...); + *state = State::ProcessSoftReference + return false; // Proceed to the Release stage. + } + } + +} +``` + ### Ephemerons TODO +## Deprecated reference and finalizable processors + +When porting MMTk from JikesRVM to a dedicated Rust library, we also ported the `ReferenceProcessor` +and the `FinalizableProcessor` from JikesRVM. They are implemented in mmtk-core, and provide the +mechanisms for handling Java-style soft/weak/phantom references and finalizable objects. The VM +binding can use those utilities by implementing the `mmtk::vm::ReferenceGlue` and the +`mmtk::vm::Finalizable` traits, and calling the +`mmtk::memory_manager::add_{soft,weak,phantom}_candidate` and the +`mmtk::memory_manager::add_finalizer` functions. + +However, those mechanisms are too specific to Java, and are not applicable to most other VMs. **New +VM bindings should use the `Scanning::process_weak_refs` API**, and we are porting existing VM +bindings away from the built-in reference/finalizable processors. + From a176a4186326c71cc26b9d1ba7931e57f629e18c Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Wed, 22 Jan 2025 16:19:32 +0800 Subject: [PATCH 03/23] Optimization section --- .../src/portingguide/topics/weakref.md | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 33ccd8707b..5f0c498ae4 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -311,6 +311,66 @@ fn process_weak_ref(...) -> bool { TODO +## Optimizations + +### Generational GC + +MMTk provides generational GC plans. Currently, there are `GenCopy`, `GenImmix` and `StickyImmix`. +In a minor GC, a generational plan only consider *young objects* (i.e. objects allocated since the +last GC) as candidates of garbage, and will assume all *old objects* (i.e. objects survived the last +GC) are live. + +The VM binding can query if the current GC is a nursery GC by calling + +```rust +let is_nursery_gc = mmtk.get_plan().is_some_and(|gen| gen.is_current_gc_nursery()); +``` + +The VM binding can make use of this information when processing finalizers and weak references. In +a minor GC, + +- The VM binding only needs to visit **finalizable objects allocated since the last GC**. Other + finalizable objects must be old and will not be considered dead. +- The VM binding only needs to visit **weak reference slots written since the last GC**. Other + slots must be pointing to old objects (if not `null`). For weak hash tables, if existing + entries are immutable, it is sufficient to visit newly added entires. + +Implementation-wise, the VM binding can split the list or hash tables into two parts: one for old +entries and another for young entries. + +### Copying versus non-copying GC + +During non-copying GC, objects will not be moved. In MMTk, `MarkSweep` never moves any objects. +`MarkCompact`, `SemiSpace` always moves all objects. Immix-based plans sometimes do non-copying GC, +and sometimes do copying GC. Regardless of the plan, the VM binding can query if the current GC is +a copying GC by calling + +```rust +let may_move_object = mmtk.get_plan().current_gc_may_move_object(); +``` + +If it returns `false`, the current GC will not move any object. + +The VM binding can make use of this information. For example, if a weak hash table uses object +addresses as keys, and the hash code is computed directly from the address, then the VM will need to +rehash the table during copying GC because changing the address may move the entry to a different +hash bin. But if the current GC is non-moving, the VM binding will not need to rehash the table, +but only needs to remove entries for dead objects. Despite of this optimization opportunity, we +still recommend VMs to implement *address-based hashing* if possible. + +```admonish info +When using **address-based hashing**, the hash code of an object depends on whether its hash code +has been observed before, and whether it has been moved after its hash code has been observed. + +- If never observed, the hash code of an object will be its current address. +- When the object is moved the first time after its hash code is observed, the GC thread copy its + old address to a field of the new copy. Its hash code will be read from that field. +- When such an object is copied again, its hash code will be copied to the new copy of the object. + The hash code of the object remains unchanged. + +The VM binding needs to implement this in `ObjectModel::copy`. +``` + ## Deprecated reference and finalizable processors When porting MMTk from JikesRVM to a dedicated Rust library, we also ported the `ReferenceProcessor` From f1c0970ca2f8bf062ad998b438ffdf770755b5af Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Wed, 22 Jan 2025 20:11:22 +0800 Subject: [PATCH 04/23] Update comments --- src/vm/scanning.rs | 107 ++++++++++++++++++++++++++++----------------- 1 file changed, 67 insertions(+), 40 deletions(-) diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index b02feb7fe1..18a36d8f27 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -287,47 +287,41 @@ pub trait Scanning { /// MMTk core enables the VM binding to do the following in this function: /// /// 1. Query if an object is already reached in this transitive closure. + /// - by calling `ObjectReference::is_reachable()` /// 2. Get the new address of an object if it is already reached. + /// - by calling `ObjectReference::get_forwarded_object()` /// 3. Keep an object and its descendents alive if not yet reached. + /// - using `tracer_context` /// 4. Request this function to be called again after transitive closure is finished again. - /// - /// The VM binding can query if an object is currently reached by calling - /// `ObjectReference::is_reachable()`. - /// - /// If an object is already reached, the VM binding can get its new address by calling - /// `ObjectReference::get_forwarded_object()` as the object may have been moved. - /// - /// If an object is not yet reached, the VM binding can keep that object and its descendents - /// alive. To do this, the VM binding should use `tracer_context.with_tracer` to get access to - /// an `ObjectTracer`, and then call its `trace_object(object)` method. The `trace_object` - /// method will return the new address of the `object` if it moved the object, or its original - /// address if not moved. Implementation-wise, the `ObjectTracer` may contain an internal - /// queue for newly traced objects, and will flush the queue when `tracer_context.with_tracer` - /// returns. Therefore, it is recommended to reuse the `ObjectTracer` instance to trace - /// multiple objects. - /// - /// *Note that if `trace_object` is called on an already reached object, the behavior will be - /// equivalent to `ObjectReference::get_forwarded_object()`. It will return the new address if - /// the GC already moved the object when tracing that object, or the original address if the GC - /// did not move the object when tracing it. In theory, the VM binding can use `trace_object` - /// wherever `ObjectReference::get_forwarded_object()` is needed. However, if a VM never - /// resurrects objects, it should completely avoid touching `tracer_context`, and exclusively - /// use `ObjectReference::get_forwarded_object()` to get new addresses of objects. By doing - /// so, the VM binding can avoid accidentally resurrecting objects.* - /// - /// The VM binding can return `true` from `process_weak_refs` to request `process_weak_refs` - /// to be called again after the MMTk core finishes transitive closure again from the objects - /// newly visited by `ObjectTracer::trace_object`. This is useful if a VM supports multiple - /// levels of reachabilities (such as Java) or ephemerons. - /// - /// Implementation-wise, this function is called as the "sentinel" of the `VMRefClosure` work - /// bucket, which means it is called when all work packets in that bucket have finished. The - /// `tracer_context` expands the transitive closure by adding more work packets in the same - /// bucket. This means if `process_weak_refs` returns true, those work packets will have - /// finished (completing the transitive closure) by the time `process_weak_refs` is called - /// again. The VM binding can make use of this by adding custom work packets into the - /// `VMRefClosure` bucket. The bucket will be `VMRefForwarding`, instead, when forwarding. - /// See below. + /// - by returning `true` + /// + /// The `tracer_context` parameter provides the VM binding the mechanism for retaining + /// unreachable objects (i.e. keeping them alive in this GC). The snippet shows a typical use + /// case of handling finalizable objects for a Java-like language. + /// + /// ```rust + /// let finalizable_objects = ...; + /// let mut new_finalizable_objects = vec![]; + /// + /// tracer_context.with_tracer(worker, |tracer| { + /// for object in finalizable_objects { + /// if object.is_reachable() { + /// // Object is still alive, and may be moved if it's copying GC. + /// let new_object = object.get_forwarded_object().unwrap_or(object); + /// new_finalizable_objects.push(new_object); + /// } else { + /// // Object is dead. Retain it. + /// let new_object = tracer.trace_object(object); + /// enqueue_finalizable_object_to_be_executed_later(new_object); + /// } + /// } + /// }); + /// ``` + /// + /// Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)` + /// to retain `object` and get its new address if moved. After `with_tracer` returns, it will + /// create work packets in the `VMRefClosure` work bucket to compute the transitive closure from + /// the objects retained in the closure. /// /// The `memory_manager::is_mmtk_object` function can be used in this function if /// - the "is_mmtk_object" feature is enabled, and @@ -338,8 +332,41 @@ pub trait Scanning { /// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and /// update weak references. /// - /// This function shall return true if this function needs to be called again after the GC - /// finishes expanding the transitive closure from the objects kept alive. + /// If `process_weak_refs` returns `true`, then `process_weak_refs` will be called again after + /// all work packets in the `VMRefClosure` work bucket has been executed, by which time all + /// objects reachable from the objects retained in this function will have been reached. + /// + /// # Performance notes + /// + /// **Retain as many objects as needed in one invocation of `tracer_context.with_tracer`, and + /// avoid calling `with_tracer` again and again** for each object. The `tracer` provided by + /// `ObjectTracerFactory::with_tracer` enqueues retained objects in an internal list specific to + /// this invocation of `with_tracer`, and will create reasonably sized work packets to compute + /// the transitive closure. This means the invocation of `with_tracer` has a non-trivial + /// overhead, but each invocation of `tracer.trace_object` is cheap. + /// + /// *Don't do this*: + /// + /// ```rust + /// for object in objects { + /// tracer_context.with_tracer(worker, |tracer| { // This is expensive! DONT DO THIS! + /// tracer.trace_object(object); + /// }); + /// } + /// ``` + /// + /// **Use `ObjectReference::get_forwarded_object()` to get the forwarded address of reachable + /// objects. Only use `tracer.trace_object` for retaining unreachable objects.** If + /// `trace_object` is called on an already reached object, it will also return its new address + /// if moved. However, `tracer_context.with_tracer` has a cost, and the VM binding may + /// accidentally resurrect unreachable objects if failed to check `object.is_reachable()` first. + /// If the VM binding does not intend to resurrect objects, it should completely avoid touching + /// `tracer_context`. + /// + /// **Clone the `tracer_context` for paralelism.** The `ObjectTracerContext` has `Clone` as its + /// supertrait. The VM binding can clone it and distribute each clone into a work packet. By + /// doing so, the VM binding can parallelize the processing of finalizers and weak references by + /// creating multiple work packets. fn process_weak_refs( _worker: &mut GCWorker, _tracer_context: impl ObjectTracerContext, From 29d443c433602a66e9d5d6dabd5a1e908eadbea7 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 23 Jan 2025 10:20:59 +0800 Subject: [PATCH 05/23] Minor fixes --- src/vm/scanning.rs | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index 18a36d8f27..ae4bc04a1c 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -282,7 +282,9 @@ pub trait Scanning { /// Process weak references. /// - /// This function is called after a transitive closure is completed. + /// This function is called in a GC after the transitive closure from roots is computed, that + /// is, all reachable objects from roots are reached. This function gives the VM binding an + /// opportunitiy to process finalizers and weak references. /// /// MMTk core enables the VM binding to do the following in this function: /// @@ -296,8 +298,8 @@ pub trait Scanning { /// - by returning `true` /// /// The `tracer_context` parameter provides the VM binding the mechanism for retaining - /// unreachable objects (i.e. keeping them alive in this GC). The snippet shows a typical use - /// case of handling finalizable objects for a Java-like language. + /// unreachable objects (i.e. keeping them alive in this GC). The following snippet shows a + /// typical use case of handling finalizable objects for a Java-like language. /// /// ```rust /// let finalizable_objects = ...; From 79c83989ead6554c94067dced8bb1f0f6183b3d9 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 23 Jan 2025 10:31:03 +0800 Subject: [PATCH 06/23] Use the word "resurrect" consistently --- .../src/portingguide/topics/weakref.md | 52 +++++++++---------- src/vm/scanning.rs | 26 +++++----- 2 files changed, 39 insertions(+), 39 deletions(-) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 5f0c498ae4..63aa84eccd 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -29,19 +29,18 @@ which is implemented by the VM binding. Inside this function, the VM binding ca the transitive closure. - **Query forwarded address**: If an object is already reached, the VM binding can further query the new address of an object. This is needed to support copying GC. - - **Retain object**: If an object is not reached, the VM binding can optionally request to - retain (i.e. "resurrect") the object. It will keep that object *and all descendants* - alive. + - **Resurrect object**: If an object is not reached, the VM binding can optionally resurrect + the object. It will keep that object *and all descendants* alive. - **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be - *called again* after computing the transitive closure that includes *retained objects and their - descendants*. This helps handling multiple levels of weak reference strength. + *called again* after computing the transitive closure that includes *resurrected objects and + their descendants*. This helps handling multiple levels of weak reference strength. Concretely, - `ObjectReference::is_reachable()` queries reachability, - `ObjectReference::get_forwarded_object()` queries forwarded address, and -- the `tracer_context` argument provided by the `Scanning::process_weak_refs` function can retain - objects. +- the `tracer_context` argument provided by the `Scanning::process_weak_refs` function can + resurrect objects. - Returning `true` from `Scanning::process_weak_refs` will make it called again. The `Scanning::process_weak_refs` function also gives the VM binding a chance to perform other @@ -111,8 +110,8 @@ Some VMs, particularly the Java VM, executes finalizers during mutator time. Th objects must be brought back to life so that they can still be accessed after the GC. The `Scanning::process_weak_refs` has an parameter `tracer_context: impl ObjectTracerContext`. -This parameter provides the necessary mechanism to retain (i.e. "resurrect") objects and make them -(and their descendants) live through the current GC. The typical use pattern is: +This parameter provides the necessary mechanism to resurrect objects and make them (and their +descendants) live through the current GC. The typical use pattern is: ```rust impl Scanning for VMScanning { @@ -130,7 +129,7 @@ impl Scanning for VMScanning { let new_object = object.get_forwarded_object().unwrap_or(object); new_finalizable_objects.push(new_object); } else { - // Object is dead. Retain it. + // Object is dead. Resurrect it. let new_object = tracer.trace_object(object); enqueue_finalizable_object_to_be_executed_later(new_object); } @@ -143,14 +142,14 @@ impl Scanning for VMScanning { ``` The `tracer` parameter of the closure is an `ObjectTracer`. It provides the `trace_object` method -which retains an object and returns the forwarded address. +which resurrects an object and returns the forwarded address. `tracer_context.with_tracer` creates a temporary `ObjectTracer` instance which the VM binding can -use within the given closure. Objects retained by `trace_object` in the closure are enqueued. +use within the given closure. Objects resurrected by `trace_object` in the closure are enqueued. After the closure returns, `with_tracer` will create reasonably-sized work packets for tracing the -retained objects and their descendants. Therefore, the VM binding is encouraged use one -`with_tracer` invocation to retain as many objects as needed. Do not call `with_tracer` too often, -or it will create too many small work packets, which hurts the performance. +resurrected objects and their descendants. Therefore, the VM binding is encouraged use one +`with_tracer` invocation to resurrect as many objects as needed. Do not call `with_tracer` too +often, or it will create too many small work packets, which hurts the performance. Keep in mind that **`ObjectTracerContext` implements `Clone`**. If the VM has too many finalizable objects, it is advisable to split the list of finalizable objects into smaller chunks. Create one @@ -204,20 +203,21 @@ and may even resurrect the unreachable referent if we need to. ### Soft references Java has a special kind of weak reference: `SoftReference`. The API allows the GC to choose whether -to retain or clear references to softly reachable objects. When using MMTk, there are two ways to -implement it. +to resurrect or clear references to softly reachable objects. When using MMTk, there are two ways +to implement it. The easiest way is **treating `SoftReference` as strong references in non-emergency GCs, and treating them as weak references in emergency GCs**. During non-emergency GC, we let `Scanning::scan_objects` scan the weak reference field inside a `SoftReference` instance as if it were an ordinary strong reference field. In this way, the (strong) transitive closure after the -`Closure` stage will also include softly reachable objects, and they will be retained. During +`Closure` stage will also include softly reachable objects, and they will be resurrected. During emergency GC, however, skip this field in `Scanning::scan_objects`, and clear `SoftReference` just like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly reachable objects will be dead if not subject to finalization. -The other way is **retaining `SoftReference` after the strong closure**. This involves supporting -multiple levels of reference strengths, which will be introduced in the next section. +The other way is **resurrecting referents of `SoftReference` after the strong closure**. This +involves supporting multiple levels of reference strengths, which will be introduced in the next +section. ### Multiple levels of reference strength @@ -228,9 +228,9 @@ order of decreasing strength. This can be supported by running `Scanning::process_weak_refs` multiple times. If `process_weak_refs` returns `true`, it will be called again after all pending work packets in the `VMRefClosure` stage has been executed. That include all work packets that compute the transitive -closure from objects retained (i.e. "resurrected") during `process_weak_refs`. This allows the VM -binding to expand the transitive closure multiple times, each retaining objects at different levels -of reachability. +closure from objects resurrected during `process_weak_refs`. This allows the VM binding to expand +the transitive closure multiple times, each handling weak references at different levels of +reachability. Take Java as an example, we may run `process_weak_refs` four times. @@ -238,10 +238,10 @@ Take Java as an example, we may run `process_weak_refs` four times. - If the referent is reachable, then - forward the referent field. - If the referent is unreachable, choose between one of the following: - - Retain the referent and update the referent field. + - Resurrect the referent and update the referent field. - Clear the referent field, remove the `SoftReference` from the list of soft references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. - - (This step may expand the transitive closure if any referents are retained.) + - (This step may expand the transitive closure if any referents are resurrected.) 2. Visit all `WeakReference`. - If the referent is reachable, then - forward the referent field. @@ -254,7 +254,7 @@ Take Java as an example, we may run `process_weak_refs` four times. - forward the reference to it since it may have been moved. - If the finalizable object is unreachable, then - remove it from the list of finalizable objects, and enqueue it for finalization. - - (This step may expand the transitive closure if any finalizable objects are retained.) + - (This step may expand the transitive closure if any finalizable objects are resurrected.) 4. Visit all `PhantomReference`. - If the referent is reachable, then - forward the referent field. (Note: `PhantomReference#get()` always returns `null`, but diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index ae4bc04a1c..9fe975b515 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -297,7 +297,7 @@ pub trait Scanning { /// 4. Request this function to be called again after transitive closure is finished again. /// - by returning `true` /// - /// The `tracer_context` parameter provides the VM binding the mechanism for retaining + /// The `tracer_context` parameter provides the VM binding the mechanism for resurrecting /// unreachable objects (i.e. keeping them alive in this GC). The following snippet shows a /// typical use case of handling finalizable objects for a Java-like language. /// @@ -312,7 +312,7 @@ pub trait Scanning { /// let new_object = object.get_forwarded_object().unwrap_or(object); /// new_finalizable_objects.push(new_object); /// } else { - /// // Object is dead. Retain it. + /// // Object is dead. resurrect it. /// let new_object = tracer.trace_object(object); /// enqueue_finalizable_object_to_be_executed_later(new_object); /// } @@ -321,9 +321,9 @@ pub trait Scanning { /// ``` /// /// Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)` - /// to retain `object` and get its new address if moved. After `with_tracer` returns, it will - /// create work packets in the `VMRefClosure` work bucket to compute the transitive closure from - /// the objects retained in the closure. + /// to resurrect `object` and get its new address if moved. After `with_tracer` returns, it + /// will create work packets in the `VMRefClosure` work bucket to compute the transitive closure + /// from the objects resurrected in the closure. /// /// The `memory_manager::is_mmtk_object` function can be used in this function if /// - the "is_mmtk_object" feature is enabled, and @@ -331,21 +331,21 @@ pub trait Scanning { /// /// Arguments: /// * `worker`: The current GC worker. - /// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and + /// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to resurrect and /// update weak references. /// /// If `process_weak_refs` returns `true`, then `process_weak_refs` will be called again after /// all work packets in the `VMRefClosure` work bucket has been executed, by which time all - /// objects reachable from the objects retained in this function will have been reached. + /// objects reachable from the objects resurrected in this function will have been reached. /// /// # Performance notes /// - /// **Retain as many objects as needed in one invocation of `tracer_context.with_tracer`, and + /// **Resurrect as many objects as needed in one invocation of `tracer_context.with_tracer`, and /// avoid calling `with_tracer` again and again** for each object. The `tracer` provided by - /// `ObjectTracerFactory::with_tracer` enqueues retained objects in an internal list specific to - /// this invocation of `with_tracer`, and will create reasonably sized work packets to compute - /// the transitive closure. This means the invocation of `with_tracer` has a non-trivial - /// overhead, but each invocation of `tracer.trace_object` is cheap. + /// `ObjectTracerFactory::with_tracer` enqueues resurrected objects in an internal list specific + /// to this invocation of `with_tracer`, and will create reasonably sized work packets to + /// compute the transitive closure. This means the invocation of `with_tracer` has a + /// non-trivial overhead, but each invocation of `tracer.trace_object` is cheap. /// /// *Don't do this*: /// @@ -358,7 +358,7 @@ pub trait Scanning { /// ``` /// /// **Use `ObjectReference::get_forwarded_object()` to get the forwarded address of reachable - /// objects. Only use `tracer.trace_object` for retaining unreachable objects.** If + /// objects. Only use `tracer.trace_object` for resurrecting unreachable objects.** If /// `trace_object` is called on an already reached object, it will also return its new address /// if moved. However, `tracer_context.with_tracer` has a cost, and the VM binding may /// accidentally resurrect unreachable objects if failed to check `object.is_reachable()` first. From fc946c8623e8e8f5c0a5530d2cbde2ad67723f43 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 23 Jan 2025 12:39:12 +0800 Subject: [PATCH 07/23] Rewrite part of the finalizers section --- .../src/portingguide/topics/weakref.md | 138 ++++++++++-------- src/vm/scanning.rs | 4 +- 2 files changed, 78 insertions(+), 64 deletions(-) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 63aa84eccd..05ae51f8ed 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -22,26 +22,22 @@ In other words, a Java `Reference` instance has a field that holds a weak refere ## Overview -During each GC, after the transitive closure is computed, MMTk calls `Scanning::process_weak_refs` -which is implemented by the VM binding. Inside this function, the VM binding can do several things. - -- **Query reachability**: The VM binding can query whether any given object has been reached in - the transitive closure. - - **Query forwarded address**: If an object is already reached, the VM binding can further - query the new address of an object. This is needed to support copying GC. - - **Resurrect object**: If an object is not reached, the VM binding can optionally resurrect - the object. It will keep that object *and all descendants* alive. +During each GC, after the transitive closure is computed (i.e. after all objects reachable from +roots have been reached), MMTk calls `Scanning::process_weak_refs` which is implemented by the VM +binding. Inside this function, the VM binding can do several things. + +- **Query reachability**: The VM binding can query whether any given object has been reached. + + Do this with `ObjectReference::is_reachable()`. +- **Query forwarded address**: If an object is already reached, the VM binding can further query + the new address of an object. This is needed to support copying GC. + + Do this with `ObjectReference::get_forwarded_object()`. +- **Resurrect objects**: If an object is not reached, the VM binding can optionally resurrect the + object. It will keep that object *and all descendants* alive. + + Do this with the `tracer_context` argument of `process_weak_refs`. - **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be - *called again* after computing the transitive closure that includes *resurrected objects and - their descendants*. This helps handling multiple levels of weak reference strength. - -Concretely, - -- `ObjectReference::is_reachable()` queries reachability, -- `ObjectReference::get_forwarded_object()` queries forwarded address, and -- the `tracer_context` argument provided by the `Scanning::process_weak_refs` function can - resurrect objects. -- Returning `true` from `Scanning::process_weak_refs` will make it called again. + called again after computing the transitive closure that includes *resurrected objects and their + descendants*. This helps handling multiple levels of weak reference strength. + + Do this by returning `true` from `process_weak_refs`. The `Scanning::process_weak_refs` function also gives the VM binding a chance to perform other operations, including (but not limited to) @@ -54,55 +50,56 @@ operations, including (but not limited to) - **Clear the field**: It can clear the field if the referent is unreachable. Using those primitive operations, the VM binding can support different flavours of finalizers and/or -weak references. We will discuss different use cases in the following sections. +weak references. We will discuss common use cases in the following sections. ## Supporting finalizers Different VMs define "finalizer" differently, but they all involve performing operations when an object is dead. The general way to handle finalizer is visiting all **finalizable objects** (i.e. -objects that have associated finalization operations), check if they are dead and, if dead, do -something about them. +objects that have associated finalization operations), check if they are unreachable and, if +unreachable, do something about them. ### Identifying finalizable objects Some VMs determine whether an object is finalizable by its type. In Java, for example, an object is -finalizable if its `finalize()` method is overridden. We can register instances of such types when -they are constructed. +finalizable if its `finalize()` method is overridden. The VM binding can maintain a list of +finalizable objects, and register instances of such types into that list when they are constructed. -Some VMs can attach finalizing operations to an object after it is created. The VM can maintain a -list of objects with attached finalizers, or maintain a (weak) hash map that maps finalizable -objects to its associated finalizers. +Some VMs can dynamically attach finalizing operations to individual objects after objects are +created. The VM binding can maintain a list of objects with attached finalizers, or maintain a +(weak) hash map that maps finalizable objects to its associated finalizers. ### When to run finalizers? -Depending on the semantics, finalizers can be executed during GC or during mutator time after GC. +Depending on the finalizer semantics in different VMs, finalizers can be executed during GC or +during mutator time after GC. -The VM binding can run finalizers in `Scanning::process_weak_refs` after finding a finalizable -object dead. But beware that MMTk is usually run with multiple GC workers. The VM binding can -parallelise the operations by creating work packets. The `Scanning::process_weak_refs` function is -executed in the `VMRefClosure` stage, so the created work packets shall be added to the same bucket. +The VM binding can run finalizers immediately in `Scanning::process_weak_refs` when finding a +finalizable object unreachable. Beware that executing finalizers can be time-consuming. The VM +binding can creating work packets and let each work packet process a part of all finalizable +objects. In this way, multiple GC workers can process finalizable objects in parallel. The +`Scanning::process_weak_refs` function is executed in the `VMRefClosure` stage, so the created work +packets shall be added to the same bucket. -If the finalizers should be executed after GC, the VM binding should enqueue them to VM-specific -queues so that they can be picked up after GC. +If the finalizers should be executed after GC, the VM binding should enqueue such jobs to +VM-specific queues so that they can be picked up by mutator threads after GC. ### Reading the body of dead object In some VMs, finalizers can read the fields in dead objects. Such fields usually include information needed for cleaning up resources held by the object, such as file descriptors and -pointers to memory or objects not managed by GC. +pointers to memory not managed by GC. -`Scanning::process_weak_refs` is executed in the `VMRefClosure` stage, which happens after the -strong transitive closure (including all objects reachable from roots following only strong -references) has been computed, but before any object has been released (which happens in the -`Release` stage). This means the body of all objects, live or dead, can still be accessed during -this stage. +`Scanning::process_weak_refs` is executed in the `VMRefClosure` stage, which happens after computing +transitive closure, but before any object has been released (which happens in the `Release` stage). +This means the body of all objects, live or dead, can still be accessed during this stage. -Therefore, if the VM needs to execute finalizers during GC, the VM binding can execute them in -`process_weak_refs`, or create work packets in the `VMRefClosure` stage. +Therefore, there is no problem reading the object body if the VM binding executes finalizers +immediately in `process_weak_refs`, or in created work packets in the `VMRefClosure` stage. -However, if the VM needs to execute finalizers after GC, there will be a problem because the object -will be reclaimed, and memory of the object will be overwritten by other objects. In this case, the -VM will need to "resurrect" the dead object. +However, if the VM needs to execute finalizers after GC, it will be a problem because the object +will have been reclaimed, and memory of the object will have been overwritten by other objects. In +this case, the VM will need to "resurrect" the dead object. ### Resurrecting dead objects @@ -141,20 +138,34 @@ impl Scanning for VMScanning { } ``` -The `tracer` parameter of the closure is an `ObjectTracer`. It provides the `trace_object` method -which resurrects an object and returns the forwarded address. +Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)` to +resurrect `object`. It returns the new address of `object` because in a copying GC the +`trace_object` function can also move the object. -`tracer_context.with_tracer` creates a temporary `ObjectTracer` instance which the VM binding can -use within the given closure. Objects resurrected by `trace_object` in the closure are enqueued. -After the closure returns, `with_tracer` will create reasonably-sized work packets for tracing the -resurrected objects and their descendants. Therefore, the VM binding is encouraged use one -`with_tracer` invocation to resurrect as many objects as needed. Do not call `with_tracer` too -often, or it will create too many small work packets, which hurts the performance. +Under the hood, `tracer_context.with_tracer` creates a queue and calls the closure. The `tracer` +implements the `ObjectTracer` trait, and is just an interface that provides the `trace_object` +method. Objects resurrected by `tracer.trace_object` will be enqueued. After the closure returns, +`with_tracer` will split the queue into reasonably-sized work packets and add them to the +`VMRefClosure` work bucket. Those work packets will trace the resurrected objects and their +descendants, effectively expanding the transitive closure to include objects reachable from the +resurrected objects. Because of the overhead of creating queues and work packets, the VM binding +should **resurrect as objects as needed in one invocation of `with_tracer`, and avoid calling +`with_tracer` again and again for each object**. -Keep in mind that **`ObjectTracerContext` implements `Clone`**. If the VM has too many finalizable -objects, it is advisable to split the list of finalizable objects into smaller chunks. Create one -work packets for each chunk, and give each work packet a clone of `tracer_context` so that multiple -work packets can process finalizable objects in parallel. +**Don't do this**: + +```rust +for object in objects { + tracer_context.with_tracer(worker, |tracer| { // This is expensive! DON'T DO THIS! + tracer.trace_object(object); + }); +} +``` + +Keep in mind that **tracer_context implements the `Clone` trait**. As introduced in the *When to +run finalizers* section, the VM binding can use work packets to parallelise finalizer processing. +If finalizable objects can be resurrected, the VM binding can clone the `trace_context` and give +each work packet a clone of `tracer_context`. ## Supporting weak references @@ -223,7 +234,7 @@ section. Some VMs support multiple levels of weak reference strengths. Java, for example, has `SoftReference`, `WeakReference`, `FinalizerReference` (internal) and `PhantomReference`, in the -order of decreasing strength. +order of decreasing strength. This can be supported by running `Scanning::process_weak_refs` multiple times. If `process_weak_refs` returns `true`, it will be called again after all pending work packets in the @@ -264,10 +275,13 @@ Take Java as an example, we may run `process_weak_refs` four times. references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. - (This step cannot expand the transitive closure.) -As an optimization, Step 1 can be eliminated by merging it with the strong closure in non-emergency -GC, or with `WeakReference` processing in emergency GC, as we described in the previous section. -Step 2 can be merged with Step 3 since Step 2 never expands the transitive closure. Therefore, we -only need to run `process_weak_refs` twice: +As an optimization, + +- Step 1 can be eliminated by merging it with the strong closure in non-emergency GC, or with + `WeakReference` processing in emergency GC, as we described in the previous section. +- Step 2 can be merged with Step 3 since Step 2 never expands the transitive closure. + +Therefore, we only need to run `process_weak_refs` twice: 1. Handle `WeakReference` (and also `SoftReference` in emergency GC), and then handle finalizable objects. diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index 9fe975b515..e620f7bf63 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -288,7 +288,7 @@ pub trait Scanning { /// /// MMTk core enables the VM binding to do the following in this function: /// - /// 1. Query if an object is already reached in this transitive closure. + /// 1. Query if an object is already reached. /// - by calling `ObjectReference::is_reachable()` /// 2. Get the new address of an object if it is already reached. /// - by calling `ObjectReference::get_forwarded_object()` @@ -351,7 +351,7 @@ pub trait Scanning { /// /// ```rust /// for object in objects { - /// tracer_context.with_tracer(worker, |tracer| { // This is expensive! DONT DO THIS! + /// tracer_context.with_tracer(worker, |tracer| { // This is expensive! DON'T DO THIS! /// tracer.trace_object(object); /// }); /// } From 160368c8c25835dd44a2fe573470cc2807853fb4 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 23 Jan 2025 14:06:12 +0800 Subject: [PATCH 08/23] Revise the weakref and optimization sections --- .../src/portingguide/topics/weakref.md | 79 +++++++++++-------- 1 file changed, 44 insertions(+), 35 deletions(-) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 05ae51f8ed..8eca865b2b 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -187,9 +187,9 @@ values in global weak tables, are relatively straightforward. We just need to e There are also fields that in heap objects that hold weak references to other heap objects. There are two basic ways to identify them. -- **Register on creation**: We may record objects that contain such fields in a global list when - such objects are created. In `Scanning::process_weak_refs`, we just need to iterate through - this list, process the fields, and remove dead objects from the list. +- **Register on creation**: We may record objects that contain weak reference fields in a global + list when such objects are created. In `Scanning::process_weak_refs`, we just need to iterate + through this list, process the fields, and remove dead objects from the list. - **Discover objects during tracing**: While computing the transitive closure, we scan objects and discover objects that contain weak reference fields. We enqueue such objects into a list, and iterate through the list in `Scanning::process_weak_refs` after transitive closure. The list @@ -207,24 +207,25 @@ will be executed after the weak reference is cleared. Such clean-up operations can be supported similar to finalizers. While we enumerate weak references in `Scanning::process_weak_refs`, we clear weak references to unreachable objects. Depending on the -semantics, such as whether the clean-up operation can access the body of unreachable referent, we -may choose to execute the clean-up operation immediately, or enqueue them to be executed after GC, -and may even resurrect the unreachable referent if we need to. +semantics, we may choose to execute the clean-up operations immediately, or enqueue them to be +executed after GC. We may resurrect the unreachable referent if we need to. ### Soft references -Java has a special kind of weak reference: `SoftReference`. The API allows the GC to choose whether -to resurrect or clear references to softly reachable objects. When using MMTk, there are two ways -to implement it. +Java has a special kind of weak reference: `SoftReference`. The API allows the GC to choose between +(1) resurrecting softly reachable referents, and (2) clearing references to softly reachable +objects. When using MMTk, there are two ways to implement this semantics. The easiest way is **treating `SoftReference` as strong references in non-emergency GCs, and -treating them as weak references in emergency GCs**. During non-emergency GC, we let -`Scanning::scan_objects` scan the weak reference field inside a `SoftReference` instance as if it -were an ordinary strong reference field. In this way, the (strong) transitive closure after the -`Closure` stage will also include softly reachable objects, and they will be resurrected. During -emergency GC, however, skip this field in `Scanning::scan_objects`, and clear `SoftReference` just -like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly reachable objects will -be dead if not subject to finalization. +treating them as weak references in emergency GCs**. + +- During non-emergency GC, we let `Scanning::scan_objects` scan the weak reference field inside a + `SoftReference` instance as if it were an ordinary strong reference field. In this way, the + (strong) transitive closure after the `Closure` stage will also include softly reachable + objects, and they will be kept alive just like strongly reachable objects. +- During emergency GC, however, skip this field in `Scanning::scan_objects`, and clear + `SoftReference` just like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly + reachable objects will be dead unless they are subject to finalization. The other way is **resurrecting referents of `SoftReference` after the strong closure**. This involves supporting multiple levels of reference strengths, which will be introduced in the next @@ -238,21 +239,25 @@ order of decreasing strength. This can be supported by running `Scanning::process_weak_refs` multiple times. If `process_weak_refs` returns `true`, it will be called again after all pending work packets in the -`VMRefClosure` stage has been executed. That include all work packets that compute the transitive -closure from objects resurrected during `process_weak_refs`. This allows the VM binding to expand -the transitive closure multiple times, each handling weak references at different levels of -reachability. +`VMRefClosure` stage has been executed. Those work packets include all work packets that compute +the transitive closure from objects resurrected during `process_weak_refs`. This allows the VM +binding to expand the transitive closure multiple times, each handling weak references at different +levels of reachability. Take Java as an example, we may run `process_weak_refs` four times. 1. Visit all `SoftReference`. - If the referent is reachable, then - forward the referent field. - - If the referent is unreachable, choose between one of the following: - - Resurrect the referent and update the referent field. - - Clear the referent field, remove the `SoftReference` from the list of soft references, - and optionally enqueue it to the associated `ReferenceQueue` if it has one. - - (This step may expand the transitive closure if any referents are resurrected.) + - If the referent is unreachable, then + - if it is not emergency GC, then + - resurrect the referent and update the referent field. + - it it is emergency GC, then + - clear the referent field, remove the `SoftReference` from the list of soft + references, and optionally enqueue it to the associated `ReferenceQueue` if it has + one. + - (This step may expand the transitive closure in emergency GC if any referents are + resurrected.) 2. Visit all `WeakReference`. - If the referent is reachable, then - forward the referent field. @@ -262,14 +267,15 @@ Take Java as an example, we may run `process_weak_refs` four times. - (This step cannot expand the transitive closure.) 3. Visit the list of finalizable objects (may be implemented as `FinalizerReference` by some JVMs). - If the finalizable object is reachable, then - - forward the reference to it since it may have been moved. + - forward the reference to it. - If the finalizable object is unreachable, then - remove it from the list of finalizable objects, and enqueue it for finalization. - (This step may expand the transitive closure if any finalizable objects are resurrected.) 4. Visit all `PhantomReference`. - If the referent is reachable, then - - forward the referent field. (Note: `PhantomReference#get()` always returns `null`, but - the actual referent field shall hold a valid reference to the referent.) + - forward the referent field. + - (Note: `PhantomReference#get()` always returns `null`, but the actual referent field + shall hold a valid reference to the referent.) - If the referent is unreachable, then - clear the referent field, remove the `PhantomReference` from the list of phantom references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. @@ -337,7 +343,8 @@ GC) are live. The VM binding can query if the current GC is a nursery GC by calling ```rust -let is_nursery_gc = mmtk.get_plan().is_some_and(|gen| gen.is_current_gc_nursery()); +let is_nursery_gc = mmtk.get_plan().generational().is_some_and(|gen| + gen.is_current_gc_nursery()); ``` The VM binding can make use of this information when processing finalizers and weak references. In @@ -347,14 +354,14 @@ a minor GC, finalizable objects must be old and will not be considered dead. - The VM binding only needs to visit **weak reference slots written since the last GC**. Other slots must be pointing to old objects (if not `null`). For weak hash tables, if existing - entries are immutable, it is sufficient to visit newly added entires. + entries are immutable, it is sufficient to only visit newly added entires. -Implementation-wise, the VM binding can split the list or hash tables into two parts: one for old +Implementation-wise, the VM binding can split the lists or hash tables into two parts: one for old entries and another for young entries. ### Copying versus non-copying GC -During non-copying GC, objects will not be moved. In MMTk, `MarkSweep` never moves any objects. +MMTk provides both copying and non-copying GC plans. `MarkSweep` never moves any objects. `MarkCompact`, `SemiSpace` always moves all objects. Immix-based plans sometimes do non-copying GC, and sometimes do copying GC. Regardless of the plan, the VM binding can query if the current GC is a copying GC by calling @@ -370,15 +377,17 @@ addresses as keys, and the hash code is computed directly from the address, then rehash the table during copying GC because changing the address may move the entry to a different hash bin. But if the current GC is non-moving, the VM binding will not need to rehash the table, but only needs to remove entries for dead objects. Despite of this optimization opportunity, we -still recommend VMs to implement *address-based hashing* if possible. +still recommend VMs to implement *address-based hashing* if possible. In that case, we never need +to rehash any hash tables due to object movement. ```admonish info When using **address-based hashing**, the hash code of an object depends on whether its hash code has been observed before, and whether it has been moved after its hash code has been observed. - If never observed, the hash code of an object will be its current address. -- When the object is moved the first time after its hash code is observed, the GC thread copy its - old address to a field of the new copy. Its hash code will be read from that field. +- When the object is moved the first time after its hash code is observed, the GC thread copies + its old address to a field of the new copy. From then on, its hash code will be read from that + field. - When such an object is copied again, its hash code will be copied to the new copy of the object. The hash code of the object remains unchanged. From b9c4086eecf50152bc33a5b8c5c2375cbe85d280 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 23 Jan 2025 14:09:33 +0800 Subject: [PATCH 09/23] Use en_US spelling consistently --- docs/userguide/src/portingguide/topics/weakref.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 8eca865b2b..7b79d036f5 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -9,7 +9,7 @@ cleared, and associated clean-up operations will be executed. Some VMs also sup weak data structures, such as weak hash tables, where keys, values, or both, can be weak references. The concrete semantics of finalizer and weak reference varies from VM to VM, but MMTk provides a -low-level API that allows the VM bindings to implement their flavours of finalizer and weak +low-level API that allows the VM bindings to implement their flavors of finalizer and weak references on top of it. **A note for Java programmers**: In Java, the term "weak reference" often refers to instances of @@ -49,7 +49,7 @@ operations, including (but not limited to) copying GC. - **Clear the field**: It can clear the field if the referent is unreachable. -Using those primitive operations, the VM binding can support different flavours of finalizers and/or +Using those primitive operations, the VM binding can support different flavors of finalizers and/or weak references. We will discuss common use cases in the following sections. ## Supporting finalizers @@ -163,7 +163,7 @@ for object in objects { ``` Keep in mind that **tracer_context implements the `Clone` trait**. As introduced in the *When to -run finalizers* section, the VM binding can use work packets to parallelise finalizer processing. +run finalizers* section, the VM binding can use work packets to parallelize finalizer processing. If finalizable objects can be resurrected, the VM binding can clone the `trace_context` and give each work packet a clone of `tracer_context`. @@ -354,7 +354,7 @@ a minor GC, finalizable objects must be old and will not be considered dead. - The VM binding only needs to visit **weak reference slots written since the last GC**. Other slots must be pointing to old objects (if not `null`). For weak hash tables, if existing - entries are immutable, it is sufficient to only visit newly added entires. + entries are immutable, it is sufficient to only visit newly added entries. Implementation-wise, the VM binding can split the lists or hash tables into two parts: one for old entries and another for young entries. From 9ec8d5d2dd1207bfe685ae06a662b653e8f340ac Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 23 Jan 2025 17:25:30 +0800 Subject: [PATCH 10/23] Ephemeron --- .../src/portingguide/topics/weakref.md | 54 ++++++++++++++++--- 1 file changed, 46 insertions(+), 8 deletions(-) diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/topics/weakref.md index 7b79d036f5..b0775182da 100644 --- a/docs/userguide/src/portingguide/topics/weakref.md +++ b/docs/userguide/src/portingguide/topics/weakref.md @@ -219,13 +219,15 @@ objects. When using MMTk, there are two ways to implement this semantics. The easiest way is **treating `SoftReference` as strong references in non-emergency GCs, and treating them as weak references in emergency GCs**. -- During non-emergency GC, we let `Scanning::scan_objects` scan the weak reference field inside a - `SoftReference` instance as if it were an ordinary strong reference field. In this way, the - (strong) transitive closure after the `Closure` stage will also include softly reachable - objects, and they will be kept alive just like strongly reachable objects. -- During emergency GC, however, skip this field in `Scanning::scan_objects`, and clear - `SoftReference` just like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly - reachable objects will be dead unless they are subject to finalization. +- During non-emergency GC, we let `Scanning::scan_object` and + `Scanning::scan_object_and_trace_edges` scan the weak reference field inside a `SoftReference` + instance as if it were an ordinary strong reference field. In this way, the (strong) transitive + closure after the `Closure` stage will also include softly reachable objects, and they will be + kept alive just like strongly reachable objects. +- During emergency GC, however, skip this field in `Scanning::scan_object` or + `Scanning::scan_object_and_trace_edges` , and clear `SoftReference` just like `WeakReference` in + `Scanning::process_weak_refs`. In this way, softly reachable objects will be dead unless they + are subject to finalization. The other way is **resurrecting referents of `SoftReference` after the strong closure**. This involves supporting multiple levels of reference strengths, which will be introduced in the next @@ -328,7 +330,43 @@ fn process_weak_ref(...) -> bool { ### Ephemerons -TODO +An [Ephemeron] has a *key* and a *value*, both of which are object references. The key is a weak +reference, while the value keeps the referent alive only if both the ephemeron itself and the key +are reachable. + +[Ephemeron]: https://dl.acm.org/doi/10.1145/263700.263733 + +To support ephemerons, the VM binding needs to identify ephemerons. This includes ephemerons as +individual objects, objects that contain ephemerons, and, equivalently, objects that contain +key/value fields that have semantics similar to ephemerons. + +The following is the algorithm for processing ephemerons. It gradually discovers ephemerons as we +do the tracing. We maintain a queue of ephemerons which is empty before the `Closure` stage. + +1. In `Scanning::scan_object` and `Scanning::scan_object_and_trace_edges`, we enqueue ephemerons as + we scan them, but do not trace either the key or the value fields. +2. In `Scanning::process_weak_refs`, we iterate through all ephemerons in the queue. If the key of + an ephemeron is reached, but its value has not yet been reached, then resurrect its value, and + remove the ephemeron from the queue. Otherwise, keep the object in the queue. +3. If any value is resurrected, return `true` from `Scanning::process_weak_refs` so that it will be + called again after the transitive closure from retained values are computed. Then go back to + Step 2. +4. If no value is resurrected, the algorithm completes. The queue contains reachable ephemerons + that have unreachable keys. + +This algorithm can be modified if we have a list of all ephemerons before GC starts. We no longer +need to maintain the queue. + +- In Step 1, we don't need to enqueue ephemerons. +- In Step 2, we iterate through all ephemerons, and we resurrect the value if both the ephemeron + itself and the key are reached, and the value is not reached yet. We don't need to remove any + ephemeron from the list. +- When the algorithm completes, we can identify both reachable and unreachable ephemerons that + have unreachable keys. But we need to remove unreachable (dead) ephemerons from the list + because they will be recycled in the `Release` stage. + +And we can go through ephemerons with unreachable keys and do necessary clean-up operations, either +immediately or postponed to mutator time. ## Optimizations From e671ab0bc708a10a5d340a9b71902d2dac22a72b Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Wed, 12 Feb 2025 14:23:42 +0800 Subject: [PATCH 11/23] Rename "Special Topics" to "VM-specific Concerns" --- docs/userguide/src/SUMMARY.md | 4 ++-- .../userguide/src/portingguide/{topics => concerns}/prefix.md | 2 +- .../src/portingguide/{topics => concerns}/weakref.md | 0 3 files changed, 3 insertions(+), 3 deletions(-) rename docs/userguide/src/portingguide/{topics => concerns}/prefix.md (91%) rename docs/userguide/src/portingguide/{topics => concerns}/weakref.md (100%) diff --git a/docs/userguide/src/SUMMARY.md b/docs/userguide/src/SUMMARY.md index 071f5aa70d..03b4ba5018 100644 --- a/docs/userguide/src/SUMMARY.md +++ b/docs/userguide/src/SUMMARY.md @@ -36,8 +36,8 @@ - [Performance Tuning](portingguide/perf_tuning/prefix.md) - [Link Time Optimization](portingguide/perf_tuning/lto.md) - [Optimizing Allocation](portingguide/perf_tuning/alloc.md) - - [Special Topics](portingguide/topics/prefix.md) - - [Finalizers and Weak References](portingguide/topics/weakref.md) + - [VM-specific Concerns](portingguide/concerns/prefix.md) + - [Finalizers and Weak References](portingguide/concerns/weakref.md) - [API Migration Guide](migration/prefix.md) - [Template (for mmtk-core developers)](migration/template.md) diff --git a/docs/userguide/src/portingguide/topics/prefix.md b/docs/userguide/src/portingguide/concerns/prefix.md similarity index 91% rename from docs/userguide/src/portingguide/topics/prefix.md rename to docs/userguide/src/portingguide/concerns/prefix.md index 493bd551f2..be4bb5cef6 100644 --- a/docs/userguide/src/portingguide/topics/prefix.md +++ b/docs/userguide/src/portingguide/concerns/prefix.md @@ -1,4 +1,4 @@ -# Special topics +# VM-specific Concerns Every VM is special in some way. Because of this, some VM bindings may use MMTk features not usually used by most VMs, and may even deviate from the usual steps of integrating MMTk into the VM. diff --git a/docs/userguide/src/portingguide/topics/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md similarity index 100% rename from docs/userguide/src/portingguide/topics/weakref.md rename to docs/userguide/src/portingguide/concerns/weakref.md From 2dd16e2e8237e42ae8aa7cad910c990a20f32803 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Wed, 12 Feb 2025 16:36:06 +0800 Subject: [PATCH 12/23] Define finalizer and weak references --- docs/userguide/src/SUMMARY.md | 2 + docs/userguide/src/glossary.md | 36 ++++++++++++++++++ .../src/portingguide/concerns/weakref.md | 38 ++++++++++++++----- 3 files changed, 67 insertions(+), 9 deletions(-) create mode 100644 docs/userguide/src/glossary.md diff --git a/docs/userguide/src/SUMMARY.md b/docs/userguide/src/SUMMARY.md index 03b4ba5018..586facd79d 100644 --- a/docs/userguide/src/SUMMARY.md +++ b/docs/userguide/src/SUMMARY.md @@ -2,6 +2,8 @@ [Introduction](README.md) +[Glossary](glossary.md) + # For GC Developers - [Tutorial: Add a new GC plan to MMTk](tutorial/prefix.md) diff --git a/docs/userguide/src/glossary.md b/docs/userguide/src/glossary.md new file mode 100644 index 0000000000..3b4cdc38a6 --- /dev/null +++ b/docs/userguide/src/glossary.md @@ -0,0 +1,36 @@ +# Glossary + +This document explains basic concepts of garbage collection. MMTk uses those terms as described in +this document. Different VMs may define some terms differently. Should there be any confusion, +this document will help disambiguating them. We use the book [*The Garbage Collection Handbook: The +Art of Automatic Memory Management*][GCHandbook] as the primary reference. + +[GCHandbook]: https://gchandbook.org/ + +## Object graph + +Object graph is a graph-theory view of the garbage-collected heap. An **object graph** is a +directed graph that contains *nodes* and *edges*. An edge always points to a node. But unlike +conventional graphs, an edge may originate from either another node or a *root*. + +Each *node* represents an object in the heap. + +Each *edge* represents an object reference from an object or a root. A *root* is a reference held +in a slot directly accessible from [mutators][mutator], including local variables, global variables, +thread-local variables, and so on. A object can have many fields, and some fields may hold +references to objects, while others hold non-reference values. + +An object is *reachable* if there is a path in the object graph from any root to the node of the +object. Unreachable objects cannot be accessed by [mutators][mutator]. They are considered +garbage, and can be reclaimed by the garbage collector. + +[mutator]: #mutator + +## Mutator + +TODO + + + diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index b0775182da..8233f6fcf0 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -1,16 +1,36 @@ # Finalizers and Weak References -Some VMs support **finalizers**. In simple terms, finalizers are clean-up operations associated -with an object, and are executed when the object is dead. +Some VMs support *finalizers*, *weak references*, and other complex data structures that have weak +reference semantics, such as weak tables (hash tables where the key, the value or both can be weak +references), ephemerons, etc. The concrete semantics of finalizer and weak reference varies from VM +to VM, but MMTk provides a low-level API that allows the VM bindings to implement their flavors of +finalizer and weak references on top of it. -Some VMs support **weak references**. If an object cannot be reached from roots following only -strong references, the object will be considered dead. Weak references to dead objects will be -cleared, and associated clean-up operations will be executed. Some VMs also support more complex -weak data structures, such as weak hash tables, where keys, values, or both, can be weak references. +## Definitions -The concrete semantics of finalizer and weak reference varies from VM to VM, but MMTk provides a -low-level API that allows the VM bindings to implement their flavors of finalizer and weak -references on top of it. +In this chapter, we use the following definitions. They may be different from the definitions in +concrete VMs. + +**Finalizers** are clean-up operations associated with an object, and are executed when the garbage +collector determines the object is no longer reachable. Depending on the VM, + +- Finalizers may be executed immediately during GC, or postponed to mutator time. +- They may have access to the object body, or executed independently from the object. +- They may "resurrect" the unreachable object, or guarantee unreachable objects remain unreachable + after finalization. + +**Weak references** are special [object graph] edges distinct from ordinary "strong" references. + +- An object is *strongly reachable* if there is a path from roots to the object that contains only + strong references. +- An object is *weakly reachable* if any path from the roots to the object must contain at least + one weak reference. + +The garbage collector may reclaim weakly reachable objects, clear weak reference (by, for +example, assigning `null` to the slot that holds the weak reference), and/or performing associated +clean-up operations. + +[object graph]: ../../glossary.html#object-graph **A note for Java programmers**: In Java, the term "weak reference" often refers to instances of `java.lang.ref.Reference` (including the concrete classes `SoftReference`, `WeakReference`, From e038fc1c568a207bb4d5a2b3ca6ed195cf271ceb Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Wed, 12 Feb 2025 20:21:54 +0800 Subject: [PATCH 13/23] WIP: Use "retain" instead of "resurrect" --- .../src/portingguide/concerns/weakref.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index 8233f6fcf0..6ea988d6ab 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -40,22 +40,23 @@ a field that contains a pointer to the referent, and the field can be cleared wh dies. In this article, we use the term "weak reference" to refer to the pointer inside that field. In other words, a Java `Reference` instance has a field that holds a weak reference to the referent. -## Overview +## Overview of MMTk's finalizer and weak reference processing API -During each GC, after the transitive closure is computed (i.e. after all objects reachable from -roots have been reached), MMTk calls `Scanning::process_weak_refs` which is implemented by the VM -binding. Inside this function, the VM binding can do several things. +During each GC, after the transitive closure is computed (i.e. after all objects strongly reachable +from roots have been reached), MMTk calls `Scanning::process_weak_refs` which is implemented by the +VM binding. Inside this function, the VM binding can do several things. - **Query reachability**: The VM binding can query whether any given object has been reached. + Do this with `ObjectReference::is_reachable()`. - **Query forwarded address**: If an object is already reached, the VM binding can further query the new address of an object. This is needed to support copying GC. + Do this with `ObjectReference::get_forwarded_object()`. -- **Resurrect objects**: If an object is not reached, the VM binding can optionally resurrect the - object. It will keep that object *and all descendants* alive. +- **Retain objects**: If an object is not reached at this time, the VM binding can optionally + retain the object. It will make that object *and all descendants* reachable, and keep them + alive during this GC. + Do this with the `tracer_context` argument of `process_weak_refs`. - **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be - called again after computing the transitive closure that includes *resurrected objects and their + called again after computing the transitive closure that includes *retained objects and their descendants*. This helps handling multiple levels of weak reference strength. + Do this by returning `true` from `process_weak_refs`. From 30f5bfcf2aa3d4683cddfef9967d9ec49bfc3103 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 13 Feb 2025 14:08:11 +0800 Subject: [PATCH 14/23] Use "retain" instead of "resurrect" ... unless "resurrect" refers to the application-visible phenomenon. --- .../src/portingguide/concerns/weakref.md | 110 +++++++++++------- src/vm/scanning.rs | 34 +++--- 2 files changed, 82 insertions(+), 62 deletions(-) diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index 6ea988d6ab..f43ef4459b 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -26,9 +26,8 @@ collector determines the object is no longer reachable. Depending on the VM, - An object is *weakly reachable* if any path from the roots to the object must contain at least one weak reference. -The garbage collector may reclaim weakly reachable objects, clear weak reference (by, for -example, assigning `null` to the slot that holds the weak reference), and/or performing associated -clean-up operations. +The garbage collector may reclaim weakly reachable objects, clear weak references to weakly +reachable objects, and/or performing associated clean-up operations. [object graph]: ../../glossary.html#object-graph @@ -120,15 +119,16 @@ immediately in `process_weak_refs`, or in created work packets in the `VMRefClos However, if the VM needs to execute finalizers after GC, it will be a problem because the object will have been reclaimed, and memory of the object will have been overwritten by other objects. In -this case, the VM will need to "resurrect" the dead object. +this case, the VM will need to retain the dead object to make it accessible after the current GC. -### Resurrecting dead objects +### Retaining unreachable objects -Some VMs, particularly the Java VM, executes finalizers during mutator time. The dead finalizable -objects must be brought back to life so that they can still be accessed after the GC. +Some VMs, particularly the Java VM, executes finalizers during mutator time. Any finalizable +objects unreachable before a GC must be retained so that they can still be accessed by their +finalizers after the GC. The `Scanning::process_weak_refs` has an parameter `tracer_context: impl ObjectTracerContext`. -This parameter provides the necessary mechanism to resurrect objects and make them (and their +This parameter provides the necessary mechanism to retain objects and make them (and their descendants) live through the current GC. The typical use pattern is: ```rust @@ -143,11 +143,11 @@ impl Scanning for VMScanning { tracer_context.with_tracer(worker, |tracer| { for object in finalizable_objects { if object.is_reachable() { - // Object is still alive, and may be moved if it's copying GC. + // Object is still reachable, and may have been moved if it is a copying GC. let new_object = object.get_forwarded_object().unwrap_or(object); new_finalizable_objects.push(new_object); } else { - // Object is dead. Resurrect it. + // Object is unreachable. Retain it. let new_object = tracer.trace_object(object); enqueue_finalizable_object_to_be_executed_later(new_object); } @@ -160,17 +160,17 @@ impl Scanning for VMScanning { ``` Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)` to -resurrect `object`. It returns the new address of `object` because in a copying GC the -`trace_object` function can also move the object. +retain `object`. It returns the new address of `object` because in a copying GC the `trace_object` +function can also move the object. Under the hood, `tracer_context.with_tracer` creates a queue and calls the closure. The `tracer` implements the `ObjectTracer` trait, and is just an interface that provides the `trace_object` -method. Objects resurrected by `tracer.trace_object` will be enqueued. After the closure returns, +method. Objects retained by `tracer.trace_object` will be enqueued. After the closure returns, `with_tracer` will split the queue into reasonably-sized work packets and add them to the -`VMRefClosure` work bucket. Those work packets will trace the resurrected objects and their -descendants, effectively expanding the transitive closure to include objects reachable from the -resurrected objects. Because of the overhead of creating queues and work packets, the VM binding -should **resurrect as objects as needed in one invocation of `with_tracer`, and avoid calling +`VMRefClosure` work bucket. Those work packets will trace the retained objects and their +descendants, effectively expanding the transitive closure to include all objects reachable from the +retained objects. Because of the overhead of creating queues and work packets, the VM binding +should **retain as many objects as needed in one invocation of `with_tracer`, and avoid calling `with_tracer` again and again for each object**. **Don't do this**: @@ -185,9 +185,32 @@ for object in objects { Keep in mind that **tracer_context implements the `Clone` trait**. As introduced in the *When to run finalizers* section, the VM binding can use work packets to parallelize finalizer processing. -If finalizable objects can be resurrected, the VM binding can clone the `trace_context` and give +If finalizable objects need to be retained, the VM binding can clone the `trace_context` and give each work packet a clone of `tracer_context`. +### WARNING: object resurrection + +If the VM binding retains an unreachable object for finalization, and the finalizer writes a +reference of that object into a place readable by application threads, including global or static +variable, then the previously unreachable object will become reachable by the application again. +This phenomenon is known as **"resurrection"**, and can be surprising to the programmers. + +Developers of VM bindings of existing VMs may have no choice but implementing the finalizer +semantics strictly according to the specification of the VM, even if that would result in +"resurrection". JVM is a well-known example of the "resurrection" behavior, although the +`Object.finalize()` method has been deprecated for removal, in favor for alternative clean-up +mechanisms such as `PhantomReference` and `Cleaner` which never "resurrect" objects. + +Designers of new programming languages or VMs should be aware of the "resurrection" problem. It is +recommended not to let finalizers have access to the object body. For finalizers that need to +release certain resources (such as files), the VM may store relevant data (such as file descriptors) +in a separate object and use that as the context of the finalizer. + +To avoid unintentionally "resurrecting" objects, if the VM binding intends to get the new address of +a moved object, it should use `object.get_forwarded_object()` instead of +`tracer.trace_object(object)`, although the latter also returns the new address if `object` is +already moved. + ## Supporting weak references @@ -201,12 +224,12 @@ through all fields that contain weak references to objects. For each field, ### Identifying weak references -Weak references in global slots, including fields of global data structures as well as keys and/or +Weak references in *global slots*, including fields of global data structures as well as keys and/or values in global weak tables, are relatively straightforward. We just need to enumerate them in `Scanning::process_weak_refs`. -There are also fields that in heap objects that hold weak references to other heap objects. There -are two basic ways to identify them. +There are also *fields* in heap objects that hold weak references to other heap objects. There are +two basic ways to identify them. - **Register on creation**: We may record objects that contain weak reference fields in a global list when such objects are created. In `Scanning::process_weak_refs`, we just need to iterate @@ -229,16 +252,16 @@ will be executed after the weak reference is cleared. Such clean-up operations can be supported similar to finalizers. While we enumerate weak references in `Scanning::process_weak_refs`, we clear weak references to unreachable objects. Depending on the semantics, we may choose to execute the clean-up operations immediately, or enqueue them to be -executed after GC. We may resurrect the unreachable referent if we need to. +executed after GC. We may retain the unreachable referent if we need to. ### Soft references Java has a special kind of weak reference: `SoftReference`. The API allows the GC to choose between -(1) resurrecting softly reachable referents, and (2) clearing references to softly reachable -objects. When using MMTk, there are two ways to implement this semantics. +(1) retaining softly reachable referents, and (2) clearing references to softly reachable objects. +When using MMTk, there are two ways to implement this semantics. The easiest way is **treating `SoftReference` as strong references in non-emergency GCs, and -treating them as weak references in emergency GCs**. +treating them like `WeakReference` in emergency GCs**. - During non-emergency GC, we let `Scanning::scan_object` and `Scanning::scan_object_and_trace_edges` scan the weak reference field inside a `SoftReference` @@ -246,13 +269,12 @@ treating them as weak references in emergency GCs**. closure after the `Closure` stage will also include softly reachable objects, and they will be kept alive just like strongly reachable objects. - During emergency GC, however, skip this field in `Scanning::scan_object` or - `Scanning::scan_object_and_trace_edges` , and clear `SoftReference` just like `WeakReference` in + `Scanning::scan_object_and_trace_edges`, and clear `SoftReference` just like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly reachable objects will be dead unless they are subject to finalization. -The other way is **resurrecting referents of `SoftReference` after the strong closure**. This -involves supporting multiple levels of reference strengths, which will be introduced in the next -section. +The other way is **retaining referents of `SoftReference` after the strong closure**. This involves +supporting multiple levels of reference strengths, which will be introduced in the next section. ### Multiple levels of reference strength @@ -262,10 +284,10 @@ order of decreasing strength. This can be supported by running `Scanning::process_weak_refs` multiple times. If `process_weak_refs` returns `true`, it will be called again after all pending work packets in the -`VMRefClosure` stage has been executed. Those work packets include all work packets that compute -the transitive closure from objects resurrected during `process_weak_refs`. This allows the VM +`VMRefClosure` stage has been executed. Those pending work packets include all work packets that +compute the transitive closure from objects retained during `process_weak_refs`. This allows the VM binding to expand the transitive closure multiple times, each handling weak references at different -levels of reachability. +levels of strength. Take Java as an example, we may run `process_weak_refs` four times. @@ -274,13 +296,12 @@ Take Java as an example, we may run `process_weak_refs` four times. - forward the referent field. - If the referent is unreachable, then - if it is not emergency GC, then - - resurrect the referent and update the referent field. + - retain the referent and update the referent field. - it it is emergency GC, then - clear the referent field, remove the `SoftReference` from the list of soft references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. - - (This step may expand the transitive closure in emergency GC if any referents are - resurrected.) + - (This step may expand the transitive closure in emergency GC if any referents are retained.) 2. Visit all `WeakReference`. - If the referent is reachable, then - forward the referent field. @@ -293,7 +314,7 @@ Take Java as an example, we may run `process_weak_refs` four times. - forward the reference to it. - If the finalizable object is unreachable, then - remove it from the list of finalizable objects, and enqueue it for finalization. - - (This step may expand the transitive closure if any finalizable objects are resurrected.) + - (This step may expand the transitive closure if any finalizable objects are retained.) 4. Visit all `PhantomReference`. - If the referent is reachable, then - forward the referent field. @@ -306,8 +327,8 @@ Take Java as an example, we may run `process_weak_refs` four times. As an optimization, -- Step 1 can be eliminated by merging it with the strong closure in non-emergency GC, or with - `WeakReference` processing in emergency GC, as we described in the previous section. +- Step 1 can be, as we described in the previous section, eliminated by merging it with the strong + closure in non-emergency GC, or with `WeakReference` processing in emergency GC. - Step 2 can be merged with Step 3 since Step 2 never expands the transitive closure. Therefore, we only need to run `process_weak_refs` twice: @@ -345,7 +366,6 @@ fn process_weak_ref(...) -> bool { return false; // Proceed to the Release stage. } } - } ``` @@ -367,20 +387,20 @@ do the tracing. We maintain a queue of ephemerons which is empty before the `Cl 1. In `Scanning::scan_object` and `Scanning::scan_object_and_trace_edges`, we enqueue ephemerons as we scan them, but do not trace either the key or the value fields. 2. In `Scanning::process_weak_refs`, we iterate through all ephemerons in the queue. If the key of - an ephemeron is reached, but its value has not yet been reached, then resurrect its value, and + an ephemeron is reached, but its value has not yet been reached, then retain its value, and remove the ephemeron from the queue. Otherwise, keep the object in the queue. -3. If any value is resurrected, return `true` from `Scanning::process_weak_refs` so that it will be +3. If any value is retained, return `true` from `Scanning::process_weak_refs` so that it will be called again after the transitive closure from retained values are computed. Then go back to Step 2. -4. If no value is resurrected, the algorithm completes. The queue contains reachable ephemerons - that have unreachable keys. +4. If no value is retained, the algorithm completes. The queue contains reachable ephemerons that + have unreachable keys. This algorithm can be modified if we have a list of all ephemerons before GC starts. We no longer need to maintain the queue. - In Step 1, we don't need to enqueue ephemerons. -- In Step 2, we iterate through all ephemerons, and we resurrect the value if both the ephemeron - itself and the key are reached, and the value is not reached yet. We don't need to remove any +- In Step 2, we iterate through all ephemerons. We retain the value if both the ephemeron itself + and the key are reached, and the value is not reached yet. We don't need to remove any ephemeron from the list. - When the algorithm completes, we can identify both reachable and unreachable ephemerons that have unreachable keys. But we need to remove unreachable (dead) ephemerons from the list diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index e620f7bf63..e49c385d70 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -297,7 +297,7 @@ pub trait Scanning { /// 4. Request this function to be called again after transitive closure is finished again. /// - by returning `true` /// - /// The `tracer_context` parameter provides the VM binding the mechanism for resurrecting + /// The `tracer_context` parameter provides the VM binding the mechanism for retaining /// unreachable objects (i.e. keeping them alive in this GC). The following snippet shows a /// typical use case of handling finalizable objects for a Java-like language. /// @@ -308,11 +308,11 @@ pub trait Scanning { /// tracer_context.with_tracer(worker, |tracer| { /// for object in finalizable_objects { /// if object.is_reachable() { - /// // Object is still alive, and may be moved if it's copying GC. + /// // Object is still reachable, and may have been moved if it is a copying GC. /// let new_object = object.get_forwarded_object().unwrap_or(object); /// new_finalizable_objects.push(new_object); /// } else { - /// // Object is dead. resurrect it. + /// // Object is unreachable. Retain it. /// let new_object = tracer.trace_object(object); /// enqueue_finalizable_object_to_be_executed_later(new_object); /// } @@ -321,9 +321,9 @@ pub trait Scanning { /// ``` /// /// Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)` - /// to resurrect `object` and get its new address if moved. After `with_tracer` returns, it - /// will create work packets in the `VMRefClosure` work bucket to compute the transitive closure - /// from the objects resurrected in the closure. + /// to retain `object` and get its new address if moved. After `with_tracer` returns, it will + /// create work packets in the `VMRefClosure` work bucket to compute the transitive closure from + /// the objects retained in the closure. /// /// The `memory_manager::is_mmtk_object` function can be used in this function if /// - the "is_mmtk_object" feature is enabled, and @@ -331,21 +331,21 @@ pub trait Scanning { /// /// Arguments: /// * `worker`: The current GC worker. - /// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to resurrect and - /// update weak references. + /// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and update + /// weak references. /// /// If `process_weak_refs` returns `true`, then `process_weak_refs` will be called again after /// all work packets in the `VMRefClosure` work bucket has been executed, by which time all - /// objects reachable from the objects resurrected in this function will have been reached. + /// objects reachable from the objects retained in this function will have been reached. /// /// # Performance notes /// - /// **Resurrect as many objects as needed in one invocation of `tracer_context.with_tracer`, and + /// **Retain as many objects as needed in one invocation of `tracer_context.with_tracer`, and /// avoid calling `with_tracer` again and again** for each object. The `tracer` provided by - /// `ObjectTracerFactory::with_tracer` enqueues resurrected objects in an internal list specific - /// to this invocation of `with_tracer`, and will create reasonably sized work packets to - /// compute the transitive closure. This means the invocation of `with_tracer` has a - /// non-trivial overhead, but each invocation of `tracer.trace_object` is cheap. + /// `ObjectTracerFactory::with_tracer` enqueues retained objects in an internal list specific to + /// this invocation of `with_tracer`, and will create reasonably sized work packets to compute + /// the transitive closure. This means the invocation of `with_tracer` has a non-trivial + /// overhead, but each invocation of `tracer.trace_object` is cheap. /// /// *Don't do this*: /// @@ -358,11 +358,11 @@ pub trait Scanning { /// ``` /// /// **Use `ObjectReference::get_forwarded_object()` to get the forwarded address of reachable - /// objects. Only use `tracer.trace_object` for resurrecting unreachable objects.** If + /// objects. Only use `tracer.trace_object` for retaining unreachable objects.** If /// `trace_object` is called on an already reached object, it will also return its new address /// if moved. However, `tracer_context.with_tracer` has a cost, and the VM binding may - /// accidentally resurrect unreachable objects if failed to check `object.is_reachable()` first. - /// If the VM binding does not intend to resurrect objects, it should completely avoid touching + /// accidentally "resurrect" dead objects if failed to check `object.is_reachable()` first. If + /// the VM binding does not intend to retain any objects, it should completely avoid touching /// `tracer_context`. /// /// **Clone the `tracer_context` for paralelism.** The `ObjectTracerContext` has `Clone` as its From b07dbd9ad75cd19e106e625dccd88e2febd52946 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 13 Feb 2025 14:49:20 +0800 Subject: [PATCH 15/23] Minor changes --- docs/userguide/src/portingguide/concerns/weakref.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index f43ef4459b..3ed66a103c 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -224,11 +224,10 @@ through all fields that contain weak references to objects. For each field, ### Identifying weak references -Weak references in *global slots*, including fields of global data structures as well as keys and/or -values in global weak tables, are relatively straightforward. We just need to enumerate them in -`Scanning::process_weak_refs`. +Weak references in fields of *global* (per-VM) data structures are relatively straightforward. We +just need to enumerate them in `Scanning::process_weak_refs`. -There are also *fields* in heap objects that hold weak references to other heap objects. There are +There are also fields in *heap objects* that hold weak references to other heap objects. There are two basic ways to identify them. - **Register on creation**: We may record objects that contain weak reference fields in a global From abf91d19054931a41eff9377cf314cada91b6b74 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Tue, 18 Feb 2025 14:15:36 +0800 Subject: [PATCH 16/23] Minor changes - Slightly rephrased the beginning of the overview section - Always use perfect tense when talking about "reached" and "not reached". --- .../src/portingguide/concerns/weakref.md | 90 ++++++++++--------- src/vm/scanning.rs | 2 +- 2 files changed, 51 insertions(+), 41 deletions(-) diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index 3ed66a103c..378d383f0c 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -12,7 +12,8 @@ In this chapter, we use the following definitions. They may be different from t concrete VMs. **Finalizers** are clean-up operations associated with an object, and are executed when the garbage -collector determines the object is no longer reachable. Depending on the VM, +collector determines the object is no longer reachable. Depending on the VM, finalizers may have +different properties. - Finalizers may be executed immediately during GC, or postponed to mutator time. - They may have access to the object body, or executed independently from the object. @@ -41,17 +42,19 @@ In other words, a Java `Reference` instance has a field that holds a weak refere ## Overview of MMTk's finalizer and weak reference processing API -During each GC, after the transitive closure is computed (i.e. after all objects strongly reachable -from roots have been reached), MMTk calls `Scanning::process_weak_refs` which is implemented by the -VM binding. Inside this function, the VM binding can do several things. +During each GC, MMTk core starts tracing from roots. It will follow strong references discovered by +`Scanning::scan_object` and `Scanning::scan_object_and_trace_edges`. After all strongly reachable +objects have been reached (i.e. the transitive closure including strongly reachable objects is +computed), MMTk will call `Scanning::process_weak_refs` which is implemented by the VM binding. +Inside this function, the VM binding can do several things. - **Query reachability**: The VM binding can query whether any given object has been reached. + Do this with `ObjectReference::is_reachable()`. -- **Query forwarded address**: If an object is already reached, the VM binding can further query - the new address of an object. This is needed to support copying GC. +- **Query forwarded address**: If an object has already been reached, the VM binding can further + query the new address of an object. This is needed to support copying GC. + Do this with `ObjectReference::get_forwarded_object()`. -- **Retain objects**: If an object is not reached at this time, the VM binding can optionally - retain the object. It will make that object *and all descendants* reachable, and keep them +- **Retain objects**: If an object has not been reached at this time, the VM binding can + optionally demand the object to be retained. That object *and all descendants* will be kept alive during this GC. + Do this with the `tracer_context` argument of `process_weak_refs`. - **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be @@ -67,7 +70,8 @@ operations, including (but not limited to) - **update fields** that contain weak references. - **Forward the field**: It can write the forwarded address of the referent if moved by a copying GC. - - **Clear the field**: It can clear the field if the referent is unreachable. + - **Clear the field**: It can clear the field if the referent has not been reached and the + binding decides it is unreachable. Using those primitive operations, the VM binding can support different flavors of finalizers and/or weak references. We will discuss common use cases in the following sections. @@ -147,7 +151,7 @@ impl Scanning for VMScanning { let new_object = object.get_forwarded_object().unwrap_or(object); new_finalizable_objects.push(new_object); } else { - // Object is unreachable. Retain it. + // Object is unreachable. Retain it, and enqueue for postponed execution. let new_object = tracer.trace_object(object); enqueue_finalizable_object_to_be_executed_later(new_object); } @@ -217,8 +221,8 @@ already moved. The general way to handle weak references is, after computing the transitive closure, iterate through all fields that contain weak references to objects. For each field, -- if the referent is already reached, write the new address of the object to the field (or do - nothing if the object is not moved); +- if the referent has already been reached, write the new address of the object to the field (or + do nothing if the object is not moved); - otherwise, clear the field, writing `null`, `nil`, or whatever represents a cleared weak reference to the field. @@ -264,13 +268,15 @@ treating them like `WeakReference` in emergency GCs**. - During non-emergency GC, we let `Scanning::scan_object` and `Scanning::scan_object_and_trace_edges` scan the weak reference field inside a `SoftReference` - instance as if it were an ordinary strong reference field. In this way, the (strong) transitive - closure after the `Closure` stage will also include softly reachable objects, and they will be - kept alive just like strongly reachable objects. + instance as if it were an ordinary strong reference field. In this way, softly reachable + objects will be included in the (strong) transitive closure from roots. By the first time + `Scanning::process_weak_refs` is called, strongly reachable objects will have already been + reached (i.e. `object.is_reachable()` will be true). They will be kept alive just like strongly + reachable objects. - During emergency GC, however, skip this field in `Scanning::scan_object` or `Scanning::scan_object_and_trace_edges`, and clear `SoftReference` just like `WeakReference` in - `Scanning::process_weak_refs`. In this way, softly reachable objects will be dead unless they - are subject to finalization. + `Scanning::process_weak_refs`. In this way, softly reachable objects will become unreachable + unless they are subject to finalization. The other way is **retaining referents of `SoftReference` after the strong closure**. This involves supporting multiple levels of reference strengths, which will be introduced in the next section. @@ -291,37 +297,41 @@ levels of strength. Take Java as an example, we may run `process_weak_refs` four times. 1. Visit all `SoftReference`. - - If the referent is reachable, then + - If the referent has been reached, then - forward the referent field. - - If the referent is unreachable, then + - If the referent has not been reached, yet, then - if it is not emergency GC, then - retain the referent and update the referent field. - it it is emergency GC, then - - clear the referent field, remove the `SoftReference` from the list of soft - references, and optionally enqueue it to the associated `ReferenceQueue` if it has - one. + - clear the referent field, + - remove the `SoftReference` from the list of soft references, and + - optionally enqueue it to the associated `ReferenceQueue` if it has one. - (This step may expand the transitive closure in emergency GC if any referents are retained.) 2. Visit all `WeakReference`. - - If the referent is reachable, then + - If the referent has been reached, then - forward the referent field. - - If the referent is unreachable, then - - clear the referent field, remove the `WeakReference` from the list of weak references, - and optionally enqueue it to the associated `ReferenceQueue` if it has one. + - If the referent has not been reached, yet, then + - clear the referent field, + - remove the `WeakReference` from the list of weak references, and + - optionally enqueue it to the associated `ReferenceQueue` if it has one. - (This step cannot expand the transitive closure.) -3. Visit the list of finalizable objects (may be implemented as `FinalizerReference` by some JVMs). - - If the finalizable object is reachable, then - - forward the reference to it. - - If the finalizable object is unreachable, then - - remove it from the list of finalizable objects, and enqueue it for finalization. +3. Visit the list of finalizable objects. + - If the finalizable object has been reached, then + - forward the reference in the list. + - If the finalizable object has not been reached, yet, then + - retain the finalizable object, and + - remove it from the list of finalizable objects, and + - enqueue it for finalization. - (This step may expand the transitive closure if any finalizable objects are retained.) 4. Visit all `PhantomReference`. - - If the referent is reachable, then + - If the referent has been reached, then - forward the referent field. - (Note: `PhantomReference#get()` always returns `null`, but the actual referent field - shall hold a valid reference to the referent.) - - If the referent is unreachable, then - - clear the referent field, remove the `PhantomReference` from the list of phantom - references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. + shall hold a valid reference to the referent before it is cleared.) + - If the referent has not been reached, yet, then + - clear the referent field, + - remove the `PhantomReference` from the list of phantom references, and + - optionally enqueue it to the associated `ReferenceQueue` if it has one. - (This step cannot expand the transitive closure.) As an optimization, @@ -386,8 +396,8 @@ do the tracing. We maintain a queue of ephemerons which is empty before the `Cl 1. In `Scanning::scan_object` and `Scanning::scan_object_and_trace_edges`, we enqueue ephemerons as we scan them, but do not trace either the key or the value fields. 2. In `Scanning::process_weak_refs`, we iterate through all ephemerons in the queue. If the key of - an ephemeron is reached, but its value has not yet been reached, then retain its value, and - remove the ephemeron from the queue. Otherwise, keep the object in the queue. + an ephemeron has been reached, but its value has not yet been reached, then retain its value, + and remove the ephemeron from the queue. Otherwise, keep the object in the queue. 3. If any value is retained, return `true` from `Scanning::process_weak_refs` so that it will be called again after the transitive closure from retained values are computed. Then go back to Step 2. @@ -399,8 +409,8 @@ need to maintain the queue. - In Step 1, we don't need to enqueue ephemerons. - In Step 2, we iterate through all ephemerons. We retain the value if both the ephemeron itself - and the key are reached, and the value is not reached yet. We don't need to remove any - ephemeron from the list. + and the key have been reached, and the value has not been reached, yet. We don't need to remove + any ephemeron from the list. - When the algorithm completes, we can identify both reachable and unreachable ephemerons that have unreachable keys. But we need to remove unreachable (dead) ephemerons from the list because they will be recycled in the `Release` stage. diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index e49c385d70..ff5234c12a 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -312,7 +312,7 @@ pub trait Scanning { /// let new_object = object.get_forwarded_object().unwrap_or(object); /// new_finalizable_objects.push(new_object); /// } else { - /// // Object is unreachable. Retain it. + /// // Object is unreachable. Retain it, and enqueue for postponed execution. /// let new_object = tracer.trace_object(object); /// enqueue_finalizable_object_to_be_executed_later(new_object); /// } From aa0930dd28714c89b9400bd590300fd46edd108c Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Tue, 18 Feb 2025 15:18:05 +0800 Subject: [PATCH 17/23] Clarify the scope and return value of is_reachable --- src/util/address.rs | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/src/util/address.rs b/src/util/address.rs index 46b65f4bd9..36a27f30b5 100644 --- a/src/util/address.rs +++ b/src/util/address.rs @@ -638,6 +638,37 @@ impl ObjectReference { } /// Is the object reachable, determined by the policy? + /// + /// # Scope + /// + /// This method is primarily used during weak reference processing. It can check if an object + /// (particularly finalizable objects and objects pointed by weak references) has been reached + /// by following strong references or weak references of higher strength. + /// + /// This method can also be used during tracing for debug purposes. + /// + /// When called at other times, particularly during mutator time, the behavior is specific to + /// the implementation of the plan and policy due to their strategies of metadata clean-up. If + /// the VM needs to know if any given reference is still valid, it should instead use the valid + /// object bit (VO bit) metadata which is enabled by the Cargo feature "vo_bit". + /// + /// # Return value + /// + /// It returns `true` if one of the following is true: + /// + /// 1. The object has been traced (i.e. reached) since tracing started. + /// 2. The policy conservatively considers the object reachable even though it has not been + /// traced. + /// - Particularly, if the plan is generational, this method will return `true` if the + /// object is mature during nursery GC. + /// + /// Due to the conservativeness, if this method returns `true`, it does not necessarily mean the + /// object must be reachable from roots. In generational GC, mature objects can be unreachable + /// from roots while the GC chooses not to reclaim their memory during nursery GC. Conversely, + /// all young objects reachable from the remembered set are retained even though some mature + /// objects in the remembered set can be unreachable in the first place. (This is known as + /// *nepotism* in GC literature.) + /// /// Note: Objects in ImmortalSpace may have `is_live = true` but are actually unreachable. pub fn is_reachable(self) -> bool { unsafe { SFT_MAP.get_unchecked(self.to_raw_address()) }.is_reachable(self) From 8b2b89d874629f47d189612f3221cbae8103f7d9 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Tue, 18 Feb 2025 17:06:21 +0800 Subject: [PATCH 18/23] Use a mock test to host the example code. This makes sure the code compiles, and makes sure we notice it if the API of the `process_weak_refs` API is changed. --- .../src/portingguide/concerns/weakref.md | 26 +---- src/vm/scanning.rs | 10 +- .../mock_test_doc_weakref_code_example.rs | 101 ++++++++++++++++++ src/vm/tests/mock_tests/mod.rs | 1 + 4 files changed, 109 insertions(+), 29 deletions(-) create mode 100644 src/vm/tests/mock_tests/mock_test_doc_weakref_code_example.rs diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index 378d383f0c..c6c1256e5d 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -136,31 +136,7 @@ This parameter provides the necessary mechanism to retain objects and make them descendants) live through the current GC. The typical use pattern is: ```rust -impl Scanning for VMScanning { - fn process_weak_refs( - worker: &mut GCWorker, - tracer_context: impl ObjectTracerContext, - ) -> bool { - let finalizable_objects = ...; - let mut new_finalizable_objects = vec![]; - - tracer_context.with_tracer(worker, |tracer| { - for object in finalizable_objects { - if object.is_reachable() { - // Object is still reachable, and may have been moved if it is a copying GC. - let new_object = object.get_forwarded_object().unwrap_or(object); - new_finalizable_objects.push(new_object); - } else { - // Object is unreachable. Retain it, and enqueue for postponed execution. - let new_object = tracer.trace_object(object); - enqueue_finalizable_object_to_be_executed_later(new_object); - } - } - }); - - // more code ... - } -} +{{#include ../../../../../src/vm/tests/mock_tests/mock_test_doc_weakref_code_example.rs:process_weak_refs_finalization}} ``` Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)` to diff --git a/src/vm/scanning.rs b/src/vm/scanning.rs index ff5234c12a..d2237aebec 100644 --- a/src/vm/scanning.rs +++ b/src/vm/scanning.rs @@ -302,19 +302,21 @@ pub trait Scanning { /// typical use case of handling finalizable objects for a Java-like language. /// /// ```rust - /// let finalizable_objects = ...; + /// let finalizable_objects: Vec = my_vm::get_finalizable_object(); /// let mut new_finalizable_objects = vec![]; /// /// tracer_context.with_tracer(worker, |tracer| { /// for object in finalizable_objects { /// if object.is_reachable() { - /// // Object is still reachable, and may have been moved if it is a copying GC. + /// // `object` is still reachable. + /// // It may have been moved if it is a copying GC. /// let new_object = object.get_forwarded_object().unwrap_or(object); /// new_finalizable_objects.push(new_object); /// } else { - /// // Object is unreachable. Retain it, and enqueue for postponed execution. + /// // `object` is unreachable. + /// // Retain it, and enqueue it for postponed finalization. /// let new_object = tracer.trace_object(object); - /// enqueue_finalizable_object_to_be_executed_later(new_object); + /// my_vm::enqueue_finalizable_object_to_be_executed_later(new_object); /// } /// } /// }); diff --git a/src/vm/tests/mock_tests/mock_test_doc_weakref_code_example.rs b/src/vm/tests/mock_tests/mock_test_doc_weakref_code_example.rs new file mode 100644 index 0000000000..d7be4e8fe6 --- /dev/null +++ b/src/vm/tests/mock_tests/mock_test_doc_weakref_code_example.rs @@ -0,0 +1,101 @@ +//! This module tests the example code in `Scanning::process_weak_refs` and `weakref.md` in the +//! Porting Guide. We only check if the example code compiles. We cannot actually run it because +//! we can't construct a `GCWorker`. + +use crate::{ + scheduler::GCWorker, + util::ObjectReference, + vm::{ObjectTracer, ObjectTracerContext, Scanning, VMBinding}, +}; + +use super::mock_test_prelude::MockVM; + +#[allow(dead_code)] // We don't construct this struct as we can't run it. +struct VMScanning; + +// Just to make the code example look better. +use MockVM as MyVM; + +// Placeholders for functions supposed to be implemented byu the VM. +mod my_vm { + use crate::util::ObjectReference; + + pub fn get_finalizable_object() -> Vec { + unimplemented!() + } + + pub fn set_new_finalizable_objects(_objects: Vec) {} + + pub fn enqueue_finalizable_object_to_be_executed_later(_object: ObjectReference) {} +} + +// ANCHOR: process_weak_refs_finalization +impl Scanning for VMScanning { + fn process_weak_refs( + worker: &mut GCWorker, + tracer_context: impl ObjectTracerContext, + ) -> bool { + let finalizable_objects: Vec = my_vm::get_finalizable_object(); + let mut new_finalizable_objects = vec![]; + + tracer_context.with_tracer(worker, |tracer| { + for object in finalizable_objects { + if object.is_reachable() { + // `object` is still reachable. + // It may have been moved if it is a copying GC. + let new_object = object.get_forwarded_object().unwrap_or(object); + new_finalizable_objects.push(new_object); + } else { + // `object` is unreachable. + // Retain it, and enqueue it for postponed finalization. + let new_object = tracer.trace_object(object); + my_vm::enqueue_finalizable_object_to_be_executed_later(new_object); + } + } + }); + + my_vm::set_new_finalizable_objects(new_finalizable_objects); + + false + } + + // ... + // ANCHOR_END: process_weak_refs_finalization + + // Methods after this are placeholders. We only ensure they compile. + + fn scan_object::VMSlot>>( + _tls: crate::util::VMWorkerThread, + _object: ObjectReference, + _slot_visitor: &mut SV, + ) { + unimplemented!() + } + + fn notify_initial_thread_scan_complete(_partial_scan: bool, _tls: crate::util::VMWorkerThread) { + unimplemented!() + } + + fn scan_roots_in_mutator_thread( + _tls: crate::util::VMWorkerThread, + _mutator: &'static mut crate::Mutator, + _factory: impl crate::vm::RootsWorkFactory<::VMSlot>, + ) { + unimplemented!() + } + + fn scan_vm_specific_roots( + _tls: crate::util::VMWorkerThread, + _factory: impl crate::vm::RootsWorkFactory<::VMSlot>, + ) { + unimplemented!() + } + + fn supports_return_barrier() -> bool { + unimplemented!() + } + + fn prepare_for_roots_re_scanning() { + unimplemented!() + } +} diff --git a/src/vm/tests/mock_tests/mod.rs b/src/vm/tests/mock_tests/mod.rs index 114d8f1859..aab9aafd8b 100644 --- a/src/vm/tests/mock_tests/mod.rs +++ b/src/vm/tests/mock_tests/mod.rs @@ -67,3 +67,4 @@ mod mock_test_vm_layout_log_address_space; mod mock_test_doc_avoid_resolving_allocator; mod mock_test_doc_mutator_storage; +mod mock_test_doc_weakref_code_example; From caefc0ce658f007d5e8800e2ed54ead29d5bee5b Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Tue, 18 Feb 2025 17:09:05 +0800 Subject: [PATCH 19/23] Fix typo --- docs/userguide/src/portingguide/concerns/weakref.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index c6c1256e5d..06f0ed26b4 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -326,7 +326,7 @@ To implement this, the VM binding may need to implement some kind of *state mach `Scanning::process_weak_refs` function behaves differently each time it is called. For example, ```rust -fn process_weak_ref(...) -> bool { +fn process_weak_refs(...) -> bool { let mut state = /* Get VM-specific states here. */; match *state { From 540f4772eb6103f2f03b3551c8129d4fa204eb05 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Tue, 18 Feb 2025 17:19:28 +0800 Subject: [PATCH 20/23] Link to .md instead of .html --- docs/userguide/src/portingguide/concerns/weakref.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/userguide/src/portingguide/concerns/weakref.md b/docs/userguide/src/portingguide/concerns/weakref.md index 06f0ed26b4..2f138dfb2a 100644 --- a/docs/userguide/src/portingguide/concerns/weakref.md +++ b/docs/userguide/src/portingguide/concerns/weakref.md @@ -30,7 +30,7 @@ different properties. The garbage collector may reclaim weakly reachable objects, clear weak references to weakly reachable objects, and/or performing associated clean-up operations. -[object graph]: ../../glossary.html#object-graph +[object graph]: ../../glossary.md#object-graph **A note for Java programmers**: In Java, the term "weak reference" often refers to instances of `java.lang.ref.Reference` (including the concrete classes `SoftReference`, `WeakReference`, From 5a9bd5fcec4ef8384012697b1285a836c24978e0 Mon Sep 17 00:00:00 2001 From: Kunshan Wang Date: Thu, 20 Feb 2025 14:25:22 +0800 Subject: [PATCH 21/23] Define emergency GC in the glossary --- docs/userguide/src/glossary.md | 16 ++++++++++++++++ src/mmtk.rs | 6 +++--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/docs/userguide/src/glossary.md b/docs/userguide/src/glossary.md index 3b4cdc38a6..c04a2faeb2 100644 --- a/docs/userguide/src/glossary.md +++ b/docs/userguide/src/glossary.md @@ -30,6 +30,22 @@ garbage, and can be reclaimed by the garbage collector. TODO +## Emergency Collection + +Also known as: *emergency GC* + +In MMTk, an emergency collection happens when a normal collection cannot reclaim enough memory to +satisfy allocation requests. Plans may do full-heap GC, defragmentation, etc. during emergency +collections in order to free up more memory. + +VM bindings can call `MMTK::is_emergency_collection` to query if the current GC is an emergency GC. +During emergency GC, the VM binding is recommended to retain fewer objects than normal GCs, to the +extent allowed by the specification of the VM or the language. For example, the VM binding may +choose not to retain objects used for caching. Specifically, for Java virtual machines, that means +not retaining referents of [`SoftReference`][java-soft-ref] which is primarily designed for +implementing memory-sensitive caches. + +[java-soft-ref]: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html