|  | 
|  | 1 | +# Finalizers and Weak References | 
|  | 2 | + | 
|  | 3 | +Some VMs support **finalizers**.  In simple terms, finalizers are clean-up operations associated | 
|  | 4 | +with an object, and are executed when the object is dead. | 
|  | 5 | + | 
|  | 6 | +Some VMs support **weak references**.  If an object cannot be reached from roots following only | 
|  | 7 | +strong references, the object will be considered dead.  Weak references to dead objects will be | 
|  | 8 | +cleared, and associated clean-up operations will be executed.  Some VMs also support more complex | 
|  | 9 | +weak data structures, such as weak hash tables, where keys, values, or both, can be weak references. | 
|  | 10 | + | 
|  | 11 | +The concrete semantics of finalizer and weak reference varies from VM to VM, but MMTk provides a | 
|  | 12 | +low-level API that allows the VM bindings to implement their flavours of finalizer and weak | 
|  | 13 | +references on top of it. | 
|  | 14 | + | 
|  | 15 | +**A note for Java programmers**: In Java, the term "weak reference" often refers to instances of | 
|  | 16 | +`java.lang.ref.Reference` (including the concrete classes `SoftReference`, `WeakReference`, | 
|  | 17 | +`PhantomReference` and the hidden `FinalizerReference` class used by some JVM implementations to | 
|  | 18 | +implement finalizers).  Instances of `Reference` are proper Java heap objects, but each instance has | 
|  | 19 | +a field that contains a pointer to the referent, and the field can be cleared when the referent | 
|  | 20 | +dies.  In this article, we use the term "weak reference" to refer to the pointer inside that field. | 
|  | 21 | +In other words, a Java `Reference` instance has a field that holds a weak reference to the referent. | 
|  | 22 | + | 
|  | 23 | +## Overview | 
|  | 24 | + | 
|  | 25 | +During each GC, after the transitive closure is computed, MMTk calls `Scanning::process_weak_refs` | 
|  | 26 | +which is implemented by the VM binding.  Inside this function, the VM binding can do several things. | 
|  | 27 | + | 
|  | 28 | +-   **Query reachability**: The VM binding can query whether any given object has been reached in | 
|  | 29 | +    the transitive closure. | 
|  | 30 | +    -   **Query forwarded address**: If an object is already reached, the VM binding can further | 
|  | 31 | +        query the new address of an object.  This is needed to support copying GC. | 
|  | 32 | +    -   **Retain object**: If an object is not reached, the VM binding can optionally request to | 
|  | 33 | +        retain (i.e.  "resurrect") the object.  It will keep that object *and all descendants* | 
|  | 34 | +        alive. | 
|  | 35 | +-   **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be | 
|  | 36 | +    *called again* after computing the transitive closure that includes *retained objects and their | 
|  | 37 | +    descendants*.  This helps handling multiple levels of weak reference strength. | 
|  | 38 | + | 
|  | 39 | +Concretely, | 
|  | 40 | + | 
|  | 41 | +-   `ObjectReference::is_reachable()` queries reachability, | 
|  | 42 | +-   `ObjectReference::get_forwarded_object()` queries forwarded address, and | 
|  | 43 | +-   the `tracer_context` argument provided by the `Scanning::process_weak_refs` function can retain | 
|  | 44 | +    objects. | 
|  | 45 | +-   Returning `true` from `Scanning::process_weak_refs` will make it called again. | 
|  | 46 | + | 
|  | 47 | +The `Scanning::process_weak_refs` function also gives the VM binding a chance to perform other | 
|  | 48 | +operations, including (but not limited to) | 
|  | 49 | + | 
|  | 50 | +-   **Do clean-up operations**: The VM binding can perform clean-up operations, or queue them to be | 
|  | 51 | +    executed after GC. | 
|  | 52 | +-   **update fields** that contain weak references. | 
|  | 53 | +    -   **Forward the field**: It can write the forwarded address of the referent if moved by a | 
|  | 54 | +        copying GC. | 
|  | 55 | +    -   **Clear the field**: It can clear the field if the referent is unreachable. | 
|  | 56 | + | 
|  | 57 | +Using those primitive operations, the VM binding can support different flavours of finalizers and/or | 
|  | 58 | +weak references.  We will discuss different use cases in the following sections. | 
|  | 59 | + | 
|  | 60 | +## Support finalizers | 
|  | 61 | + | 
|  | 62 | +Different VMs define "finalizer" differently, but they all involve performing operations when an | 
|  | 63 | +object is dead.  The general way to handle finalizer is visiting all **finalizable objects** (i.e. | 
|  | 64 | +objects that have associated finalization operations), check if they are dead and, if dead, do | 
|  | 65 | +something about them. | 
|  | 66 | + | 
|  | 67 | +### Identify finalizable objects | 
|  | 68 | + | 
|  | 69 | +Some VMs determine whether an object is finalizable by its type.  In Java, for example, an object is | 
|  | 70 | +finalizable if its `finalize()` method is overridden.  We can register instances of such types when | 
|  | 71 | +they are constructed. | 
|  | 72 | + | 
|  | 73 | +Some VMs can attach finalizing operations to an object after it is created.  The VM can maintain a | 
|  | 74 | +list of objects with attached finalizers, or maintain a (weak) hash map that maps finalizable | 
|  | 75 | +objects to its associated finalizers. | 
|  | 76 | + | 
|  | 77 | +### When to run finalizers? | 
|  | 78 | + | 
|  | 79 | +Depending on the semantics, finalizers can be executed during GC or during mutator time after GC. | 
|  | 80 | + | 
|  | 81 | +The VM binding can run finalizers in `Scanning::process_weak_refs` after finding a finalizable | 
|  | 82 | +object dead.  But beware that MMTk is usually run with multiple GC workers.  The VM binding can | 
|  | 83 | +parallelise the operations by creating work packets.  The `Scanning::process_weak_refs` function is | 
|  | 84 | +executed in the `VMRefClosure` stage, so the created work packets shall be added to the same bucket. | 
|  | 85 | + | 
|  | 86 | +If the finalizers should be executed after GC, the VM binding should enqueue them to VM-specific | 
|  | 87 | +queues so that they can be picked up after GC. | 
|  | 88 | + | 
|  | 89 | +### Reading the body of dead object | 
|  | 90 | + | 
|  | 91 | +In some VMs, finalizers can read the fields in dead objects.  Such fields usually include | 
|  | 92 | +information needed for cleaning up resources held by the object, such as file descriptors and | 
|  | 93 | +pointers to memory or objects not managed by GC. | 
|  | 94 | + | 
|  | 95 | +`Scanning::process_weak_refs` is executed in the `VMRefClosure` stage, which happens after the | 
|  | 96 | +strong transitive closure (including all objects reachable from roots following only strong | 
|  | 97 | +references) has been computed, but before any object has been released (which happens in the | 
|  | 98 | +`Release` stage).  This means the body of all objects, live or dead, can still be accessed during | 
|  | 99 | +this stage. | 
|  | 100 | + | 
|  | 101 | +Therefore, if the VM needs to execute finalizers during GC, the VM binding can execute them in | 
|  | 102 | +`process_weak_refs`, or create work packets in the `VMRefClosure` stage. | 
|  | 103 | + | 
|  | 104 | +However, if the VM needs to execute finalizers after GC, there will be a problem because the object | 
|  | 105 | +will be reclaimed, and memory of the object will be overwritten by other objects.  In this case, the | 
|  | 106 | +VM will need to "resurrect" the dead object. | 
|  | 107 | + | 
|  | 108 | +### Resurrecting dead objects | 
|  | 109 | + | 
|  | 110 | +Some VMs, particularly the Java VM, executes finalizers during mutator time.  The dead finalizable | 
|  | 111 | +objects must be brought back to life so that they can still be accessed after the GC. | 
|  | 112 | + | 
|  | 113 | +The `Scanning::process_weak_refs` has an parameter `tracer_context: impl ObjectTracerContext<VM>`. | 
|  | 114 | +This parameter provides the necessary mechanism to retain (i.e. "resurrect") objects and make them | 
|  | 115 | +(and their descendants) live through the current GC.  The typical use pattern is: | 
|  | 116 | + | 
|  | 117 | +```rust | 
|  | 118 | +impl<VM: VMBinding> Scanning<VM> for VMScanning { | 
|  | 119 | +    fn process_weak_refs( | 
|  | 120 | +        worker: &mut GCWorker<VM>, | 
|  | 121 | +        tracer_context: impl ObjectTracerContext<VM>, | 
|  | 122 | +    ) -> bool { | 
|  | 123 | +        let finalizable_objects = ...; | 
|  | 124 | +        let mut new_finalizable_objects = vec![]; | 
|  | 125 | + | 
|  | 126 | +        tracer_context.with_tracer(worker, |tracer| { | 
|  | 127 | +            for object in finalizable_objects { | 
|  | 128 | +                if object.is_reachable() { | 
|  | 129 | +                    // Object is still alive, and may be moved if it's copying GC. | 
|  | 130 | +                    let new_object = object.get_forwarded_object().unwrap_or(object); | 
|  | 131 | +                    new_finalizable_objects.push(new_object); | 
|  | 132 | +                } else { | 
|  | 133 | +                    // Object is dead.  Retain it. | 
|  | 134 | +                    let new_object = tracer.trace_object(object); | 
|  | 135 | +                    enqueue_finalizable_object_to_be_executed_later(new_object); | 
|  | 136 | +                } | 
|  | 137 | +            } | 
|  | 138 | +        }); | 
|  | 139 | + | 
|  | 140 | +        // more code ... | 
|  | 141 | +    } | 
|  | 142 | +} | 
|  | 143 | +``` | 
|  | 144 | + | 
|  | 145 | +The `tracer` parameter of the closure is an `ObjectTracer`.  It provides the `trace_object` method | 
|  | 146 | +which retains an object and returns the forwarded address. | 
|  | 147 | + | 
|  | 148 | +`tracer_context.with_tracer` creates a temporary `ObjectTracer` instance which the VM binding can | 
|  | 149 | +use within the given closure.  Objects retained by `trace_object` in the closure are enqueued. | 
|  | 150 | +After the closure returns, `with_tracer` will create reasonably-sized work packets for tracing the | 
|  | 151 | +retained objects and their descendants.  Therefore, the VM binding is encouraged use one | 
|  | 152 | +`with_tracer` invocation to retain as many objects as needed.  Do not call `with_tracer` too often, | 
|  | 153 | +or it will create too many small work packets, which hurts the performance. | 
|  | 154 | + | 
|  | 155 | +Keep in mind that **`ObjectTracerContext` implements `Clone`**.  If the VM has too many finalizable | 
|  | 156 | +objects, it is advisable to split the list of finalizable objects into smaller chunks.  Create one | 
|  | 157 | +work packets for each chunk, and give each work packet a clone of `tracer_context` so that multiple | 
|  | 158 | +work packets can process finalizable objects in parallel. | 
|  | 159 | + | 
|  | 160 | + | 
|  | 161 | +## Support weak references | 
|  | 162 | + | 
|  | 163 | +The general way to handle weak references is, after computing the transitive closure, iterate | 
|  | 164 | +through all fields that contain weak references to objects.  For each field, | 
|  | 165 | + | 
|  | 166 | +-   if the referent is already reached, write the new address of the object to the field (or do | 
|  | 167 | +    nothing if the object is not moved); | 
|  | 168 | +-   otherwise, clear the field, writing `null`, `nil`, or whatever represents a cleared weak | 
|  | 169 | +    reference to the field. | 
|  | 170 | + | 
|  | 171 | +### Identify weak references | 
|  | 172 | + | 
|  | 173 | +Weak references in the fields in global data structures, including keys and/or values in global weak | 
|  | 174 | +tables, are relatively straightforward.  We just need to enumerate them in | 
|  | 175 | +`Scanning::process_weak_refs`. | 
|  | 176 | + | 
|  | 177 | +There are also fields that in heap objects that hold weak references to other heap objects.  There | 
|  | 178 | +are two basic ways to identify them. | 
|  | 179 | + | 
|  | 180 | +-   **Register on creation**: We may record objects that contain such fields in a global list when | 
|  | 181 | +    such objects are created.  In `Scanning::process_weak_refs`, we just need to iterate through | 
|  | 182 | +    this list, process the fields, and remove dead objects from the list. | 
|  | 183 | +-   **Discover objects during tracing**: While computing the transitive closure, we scan objects and | 
|  | 184 | +    discover objects that contain weak reference fields.  We enqueue such objects into a list, and | 
|  | 185 | +    iterate through the list in `Scanning::process_weak_refs` after transitive closure.  The list | 
|  | 186 | +    needs to be reconstructed in each GC. | 
|  | 187 | + | 
|  | 188 | +Both methods work, but each has its advantages and disadvantages.  Registering on creation does not | 
|  | 189 | +need to reconstruct the list in every GC, while discovering during tracing can avoid visiting dead | 
|  | 190 | +objects.  Depending on the nature of your VM, one method may be easier to implement than the other, | 
|  | 191 | +especially if your VM's existing GC has already implemented weak reference processing in some way. | 
|  | 192 | + | 
|  | 193 | +### Multiple levels of strength | 
|  | 194 | + | 
|  | 195 | +Some VMs, such as the Java VM support multiple levels of  | 
|  | 196 | + | 
|  | 197 | + | 
|  | 198 | + | 
|  | 199 | + | 
|  | 200 | + | 
|  | 201 | +<!-- | 
|  | 202 | +vim: tw=100 ts=4 sw=4 sts=4 et | 
|  | 203 | +--> | 
0 commit comments