Skip to content

Commit 84545cc

Browse files
authored
Special topic chapter for finalizers and weak references (mmtk#1265)
This PR adds a special topic chapter in the Porting Guide for supporting finalizers and weak references. This topic is frequently asked and somewhat complex, and needs a dedicated chapter. We also updated the doc comments of the `Scanning::process_weak_refs` API to add code example of the intended use case, and warn the users about potential pitfalls.
1 parent df5e0cd commit 84545cc

File tree

9 files changed

+757
-48
lines changed

9 files changed

+757
-48
lines changed

docs/userguide/src/SUMMARY.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
[Introduction](README.md)
44

5+
[Glossary](glossary.md)
6+
57
# For GC Developers
68

79
- [Tutorial: Add a new GC plan to MMTk](tutorial/prefix.md)
@@ -36,6 +38,8 @@
3638
- [Performance Tuning](portingguide/perf_tuning/prefix.md)
3739
- [Link Time Optimization](portingguide/perf_tuning/lto.md)
3840
- [Optimizing Allocation](portingguide/perf_tuning/alloc.md)
41+
- [VM-specific Concerns](portingguide/concerns/prefix.md)
42+
- [Finalizers and Weak References](portingguide/concerns/weakref.md)
3943
- [API Migration Guide](migration/prefix.md)
4044
- [Template (for mmtk-core developers)](migration/template.md)
4145

docs/userguide/src/glossary.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Glossary
2+
3+
This document explains basic concepts of garbage collection. MMTk uses those terms as described in
4+
this document. Different VMs may define some terms differently. Should there be any confusion,
5+
this document will help disambiguating them. We use the book [*The Garbage Collection Handbook: The
6+
Art of Automatic Memory Management*][GCHandbook] as the primary reference.
7+
8+
[GCHandbook]: https://gchandbook.org/
9+
10+
## Object graph
11+
12+
Object graph is a graph-theory view of the garbage-collected heap. An **object graph** is a
13+
directed graph that contains *nodes* and *edges*. An edge always points to a node. But unlike
14+
conventional graphs, an edge may originate from either another node or a *root*.
15+
16+
Each *node* represents an object in the heap.
17+
18+
Each *edge* represents an object reference from an object or a root. A *root* is a reference held
19+
in a slot directly accessible from [mutators][mutator], including local variables, global variables,
20+
thread-local variables, and so on. A object can have many fields, and some fields may hold
21+
references to objects, while others hold non-reference values.
22+
23+
An object is *reachable* if there is a path in the object graph from any root to the node of the
24+
object. Unreachable objects cannot be accessed by [mutators][mutator]. They are considered
25+
garbage, and can be reclaimed by the garbage collector.
26+
27+
[mutator]: #mutator
28+
29+
## Mutator
30+
31+
TODO
32+
33+
## Emergency Collection
34+
35+
Also known as: *emergency GC*
36+
37+
In MMTk, an emergency collection happens when a normal collection cannot reclaim enough memory to
38+
satisfy allocation requests. Plans may do full-heap GC, defragmentation, etc. during emergency
39+
collections in order to free up more memory.
40+
41+
VM bindings can call `MMTK::is_emergency_collection` to query if the current GC is an emergency GC.
42+
During emergency GC, the VM binding is recommended to retain fewer objects than normal GCs, to the
43+
extent allowed by the specification of the VM or the language. For example, the VM binding may
44+
choose not to retain objects used for caching. Specifically, for Java virtual machines, that means
45+
not retaining referents of [`SoftReference`][java-soft-ref] which is primarily designed for
46+
implementing memory-sensitive caches.
47+
48+
[java-soft-ref]: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html
49+
50+
<!--
51+
vim: tw=100 ts=4 sw=4 sts=4 et
52+
-->
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# VM-specific Concerns
2+
3+
Every VM is special in some way. Because of this, some VM bindings may use MMTk features not
4+
usually used by most VMs, and may even deviate from the usual steps of integrating MMTk into the VM.
5+
Here we provide special guides to cover such cases.

docs/userguide/src/portingguide/concerns/weakref.md

Lines changed: 484 additions & 0 deletions
Large diffs are not rendered by default.

src/mmtk.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -382,13 +382,13 @@ impl<VM: VMBinding> MMTK<VM> {
382382
/// Return true if the current GC is an emergency GC.
383383
///
384384
/// An emergency GC happens when a normal GC cannot reclaim enough memory to satisfy allocation
385-
/// requests. Plans may do full-heap GC, defragmentation, etc. during emergency in order to
385+
/// requests. Plans may do full-heap GC, defragmentation, etc. during emergency GCs in order to
386386
/// free up more memory.
387387
///
388388
/// VM bindings can call this function during GC to check if the current GC is an emergency GC.
389389
/// If it is, the VM binding is recommended to retain fewer objects than normal GCs, to the
390-
/// extent allowed by the specification of the VM or langauge. For example, the VM binding may
391-
/// choose not to retain objects used for caching. Specifically, for Java virtual machines,
390+
/// extent allowed by the specification of the VM or the language. For example, the VM binding
391+
/// may choose not to retain objects used for caching. Specifically, for Java virtual machines,
392392
/// that means not retaining referents of [`SoftReference`][java-soft-ref] which is primarily
393393
/// designed for implementing memory-sensitive caches.
394394
///

src/util/address.rs

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -638,6 +638,37 @@ impl ObjectReference {
638638
}
639639

640640
/// Is the object reachable, determined by the policy?
641+
///
642+
/// # Scope
643+
///
644+
/// This method is primarily used during weak reference processing. It can check if an object
645+
/// (particularly finalizable objects and objects pointed by weak references) has been reached
646+
/// by following strong references or weak references of higher strength.
647+
///
648+
/// This method can also be used during tracing for debug purposes.
649+
///
650+
/// When called at other times, particularly during mutator time, the behavior is specific to
651+
/// the implementation of the plan and policy due to their strategies of metadata clean-up. If
652+
/// the VM needs to know if any given reference is still valid, it should instead use the valid
653+
/// object bit (VO-bit) metadata which is enabled by the Cargo feature "vo_bit".
654+
///
655+
/// # Return value
656+
///
657+
/// It returns `true` if one of the following is true:
658+
///
659+
/// 1. The object has been traced (i.e. reached) since tracing started.
660+
/// 2. The policy conservatively considers the object reachable even though it has not been
661+
/// traced.
662+
/// - Particularly, if the plan is generational, this method will return `true` if the
663+
/// object is mature during nursery GC.
664+
///
665+
/// Due to the conservativeness, if this method returns `true`, it does not necessarily mean the
666+
/// object must be reachable from roots. In generational GC, mature objects can be unreachable
667+
/// from roots while the GC chooses not to reclaim their memory during nursery GC. Conversely,
668+
/// all young objects reachable from the remembered set are retained even though some mature
669+
/// objects in the remembered set can be unreachable in the first place. (This is known as
670+
/// *nepotism* in GC literature.)
671+
///
641672
/// Note: Objects in ImmortalSpace may have `is_live = true` but are actually unreachable.
642673
pub fn is_reachable(self) -> bool {
643674
unsafe { SFT_MAP.get_unchecked(self.to_raw_address()) }.is_reachable(self)

src/vm/scanning.rs

Lines changed: 76 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -282,64 +282,95 @@ pub trait Scanning<VM: VMBinding> {
282282

283283
/// Process weak references.
284284
///
285-
/// This function is called after a transitive closure is completed.
285+
/// This function is called in a GC after the transitive closure from roots is computed, that
286+
/// is, all reachable objects from roots are reached. This function gives the VM binding an
287+
/// opportunitiy to process finalizers and weak references.
286288
///
287289
/// MMTk core enables the VM binding to do the following in this function:
288290
///
289-
/// 1. Query if an object is already reached in this transitive closure.
291+
/// 1. Query if an object is already reached.
292+
/// - by calling `ObjectReference::is_reachable()`
290293
/// 2. Get the new address of an object if it is already reached.
294+
/// - by calling `ObjectReference::get_forwarded_object()`
291295
/// 3. Keep an object and its descendents alive if not yet reached.
296+
/// - using `tracer_context`
292297
/// 4. Request this function to be called again after transitive closure is finished again.
293-
///
294-
/// The VM binding can query if an object is currently reached by calling
295-
/// `ObjectReference::is_reachable()`.
296-
///
297-
/// If an object is already reached, the VM binding can get its new address by calling
298-
/// `ObjectReference::get_forwarded_object()` as the object may have been moved.
299-
///
300-
/// If an object is not yet reached, the VM binding can keep that object and its descendents
301-
/// alive. To do this, the VM binding should use `tracer_context.with_tracer` to get access to
302-
/// an `ObjectTracer`, and then call its `trace_object(object)` method. The `trace_object`
303-
/// method will return the new address of the `object` if it moved the object, or its original
304-
/// address if not moved. Implementation-wise, the `ObjectTracer` may contain an internal
305-
/// queue for newly traced objects, and will flush the queue when `tracer_context.with_tracer`
306-
/// returns. Therefore, it is recommended to reuse the `ObjectTracer` instance to trace
307-
/// multiple objects.
308-
///
309-
/// *Note that if `trace_object` is called on an already reached object, the behavior will be
310-
/// equivalent to `ObjectReference::get_forwarded_object()`. It will return the new address if
311-
/// the GC already moved the object when tracing that object, or the original address if the GC
312-
/// did not move the object when tracing it. In theory, the VM binding can use `trace_object`
313-
/// wherever `ObjectReference::get_forwarded_object()` is needed. However, if a VM never
314-
/// resurrects objects, it should completely avoid touching `tracer_context`, and exclusively
315-
/// use `ObjectReference::get_forwarded_object()` to get new addresses of objects. By doing
316-
/// so, the VM binding can avoid accidentally resurrecting objects.*
317-
///
318-
/// The VM binding can return `true` from `process_weak_refs` to request `process_weak_refs`
319-
/// to be called again after the MMTk core finishes transitive closure again from the objects
320-
/// newly visited by `ObjectTracer::trace_object`. This is useful if a VM supports multiple
321-
/// levels of reachabilities (such as Java) or ephemerons.
322-
///
323-
/// Implementation-wise, this function is called as the "sentinel" of the `VMRefClosure` work
324-
/// bucket, which means it is called when all work packets in that bucket have finished. The
325-
/// `tracer_context` expands the transitive closure by adding more work packets in the same
326-
/// bucket. This means if `process_weak_refs` returns true, those work packets will have
327-
/// finished (completing the transitive closure) by the time `process_weak_refs` is called
328-
/// again. The VM binding can make use of this by adding custom work packets into the
329-
/// `VMRefClosure` bucket. The bucket will be `VMRefForwarding`, instead, when forwarding.
330-
/// See below.
298+
/// - by returning `true`
299+
///
300+
/// The `tracer_context` parameter provides the VM binding the mechanism for retaining
301+
/// unreachable objects (i.e. keeping them alive in this GC). The following snippet shows a
302+
/// typical use case of handling finalizable objects for a Java-like language.
303+
///
304+
/// ```rust
305+
/// let finalizable_objects: Vec<ObjectReference> = my_vm::get_finalizable_object();
306+
/// let mut new_finalizable_objects = vec![];
307+
///
308+
/// tracer_context.with_tracer(worker, |tracer| {
309+
/// for object in finalizable_objects {
310+
/// if object.is_reachable() {
311+
/// // `object` is still reachable.
312+
/// // It may have been moved if it is a copying GC.
313+
/// let new_object = object.get_forwarded_object().unwrap_or(object);
314+
/// new_finalizable_objects.push(new_object);
315+
/// } else {
316+
/// // `object` is unreachable.
317+
/// // Retain it, and enqueue it for postponed finalization.
318+
/// let new_object = tracer.trace_object(object);
319+
/// my_vm::enqueue_finalizable_object_to_be_executed_later(new_object);
320+
/// }
321+
/// }
322+
/// });
323+
/// ```
324+
///
325+
/// Within the closure `|tracer| { ... }`, the VM binding can call `tracer.trace_object(object)`
326+
/// to retain `object` and get its new address if moved. After `with_tracer` returns, it will
327+
/// create work packets in the `VMRefClosure` work bucket to compute the transitive closure from
328+
/// the objects retained in the closure.
331329
///
332330
/// The `memory_manager::is_mmtk_object` function can be used in this function if
333331
/// - the "is_mmtk_object" feature is enabled, and
334332
/// - `VM::VMObjectModel::NEED_VO_BITS_DURING_TRACING` is true.
335333
///
336334
/// Arguments:
337335
/// * `worker`: The current GC worker.
338-
/// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and
339-
/// update weak references.
340-
///
341-
/// This function shall return true if this function needs to be called again after the GC
342-
/// finishes expanding the transitive closure from the objects kept alive.
336+
/// * `tracer_context`: Use this to get access an `ObjectTracer` and use it to retain and update
337+
/// weak references.
338+
///
339+
/// If `process_weak_refs` returns `true`, then `process_weak_refs` will be called again after
340+
/// all work packets in the `VMRefClosure` work bucket has been executed, by which time all
341+
/// objects reachable from the objects retained in this function will have been reached.
342+
///
343+
/// # Performance notes
344+
///
345+
/// **Retain as many objects as needed in one invocation of `tracer_context.with_tracer`, and
346+
/// avoid calling `with_tracer` again and again** for each object. The `tracer` provided by
347+
/// `ObjectTracerFactory::with_tracer` enqueues retained objects in an internal list specific to
348+
/// this invocation of `with_tracer`, and will create reasonably sized work packets to compute
349+
/// the transitive closure. This means the invocation of `with_tracer` has a non-trivial
350+
/// overhead, but each invocation of `tracer.trace_object` is cheap.
351+
///
352+
/// *Don't do this*:
353+
///
354+
/// ```rust
355+
/// for object in objects {
356+
/// tracer_context.with_tracer(worker, |tracer| { // This is expensive! DON'T DO THIS!
357+
/// tracer.trace_object(object);
358+
/// });
359+
/// }
360+
/// ```
361+
///
362+
/// **Use `ObjectReference::get_forwarded_object()` to get the forwarded address of reachable
363+
/// objects. Only use `tracer.trace_object` for retaining unreachable objects.** If
364+
/// `trace_object` is called on an already reached object, it will also return its new address
365+
/// if moved. However, `tracer_context.with_tracer` has a cost, and the VM binding may
366+
/// accidentally "resurrect" dead objects if failed to check `object.is_reachable()` first. If
367+
/// the VM binding does not intend to retain any objects, it should completely avoid touching
368+
/// `tracer_context`.
369+
///
370+
/// **Clone the `tracer_context` for parallelism.** The `ObjectTracerContext` has `Clone` as
371+
/// its supertrait. The VM binding can clone it and distribute each clone into a work packet.
372+
/// By doing so, the VM binding can parallelize the processing of finalizers and weak references
373+
/// by creating multiple work packets.
343374
fn process_weak_refs(
344375
_worker: &mut GCWorker<VM>,
345376
_tracer_context: impl ObjectTracerContext<VM>,
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
//! This module tests the example code in `Scanning::process_weak_refs` and `weakref.md` in the
2+
//! Porting Guide. We only check if the example code compiles. We cannot actually run it because
3+
//! we can't construct a `GCWorker`.
4+
5+
use crate::{
6+
scheduler::GCWorker,
7+
util::ObjectReference,
8+
vm::{ObjectTracer, ObjectTracerContext, Scanning, VMBinding},
9+
};
10+
11+
use super::mock_test_prelude::MockVM;
12+
13+
#[allow(dead_code)] // We don't construct this struct as we can't run it.
14+
struct VMScanning;
15+
16+
// Just to make the code example look better.
17+
use MockVM as MyVM;
18+
19+
// Placeholders for functions supposed to be implemented byu the VM.
20+
mod my_vm {
21+
use crate::util::ObjectReference;
22+
23+
pub fn get_finalizable_object() -> Vec<ObjectReference> {
24+
unimplemented!()
25+
}
26+
27+
pub fn set_new_finalizable_objects(_objects: Vec<ObjectReference>) {}
28+
29+
pub fn enqueue_finalizable_object_to_be_executed_later(_object: ObjectReference) {}
30+
}
31+
32+
// ANCHOR: process_weak_refs_finalization
33+
impl Scanning<MyVM> for VMScanning {
34+
fn process_weak_refs(
35+
worker: &mut GCWorker<MyVM>,
36+
tracer_context: impl ObjectTracerContext<MyVM>,
37+
) -> bool {
38+
let finalizable_objects: Vec<ObjectReference> = my_vm::get_finalizable_object();
39+
let mut new_finalizable_objects = vec![];
40+
41+
tracer_context.with_tracer(worker, |tracer| {
42+
for object in finalizable_objects {
43+
if object.is_reachable() {
44+
// `object` is still reachable.
45+
// It may have been moved if it is a copying GC.
46+
let new_object = object.get_forwarded_object().unwrap_or(object);
47+
new_finalizable_objects.push(new_object);
48+
} else {
49+
// `object` is unreachable.
50+
// Retain it, and enqueue it for postponed finalization.
51+
let new_object = tracer.trace_object(object);
52+
my_vm::enqueue_finalizable_object_to_be_executed_later(new_object);
53+
}
54+
}
55+
});
56+
57+
my_vm::set_new_finalizable_objects(new_finalizable_objects);
58+
59+
false
60+
}
61+
62+
// ...
63+
// ANCHOR_END: process_weak_refs_finalization
64+
65+
// Methods after this are placeholders. We only ensure they compile.
66+
67+
fn scan_object<SV: crate::vm::SlotVisitor<<MockVM as VMBinding>::VMSlot>>(
68+
_tls: crate::util::VMWorkerThread,
69+
_object: ObjectReference,
70+
_slot_visitor: &mut SV,
71+
) {
72+
unimplemented!()
73+
}
74+
75+
fn notify_initial_thread_scan_complete(_partial_scan: bool, _tls: crate::util::VMWorkerThread) {
76+
unimplemented!()
77+
}
78+
79+
fn scan_roots_in_mutator_thread(
80+
_tls: crate::util::VMWorkerThread,
81+
_mutator: &'static mut crate::Mutator<MockVM>,
82+
_factory: impl crate::vm::RootsWorkFactory<<MockVM as VMBinding>::VMSlot>,
83+
) {
84+
unimplemented!()
85+
}
86+
87+
fn scan_vm_specific_roots(
88+
_tls: crate::util::VMWorkerThread,
89+
_factory: impl crate::vm::RootsWorkFactory<<MockVM as VMBinding>::VMSlot>,
90+
) {
91+
unimplemented!()
92+
}
93+
94+
fn supports_return_barrier() -> bool {
95+
unimplemented!()
96+
}
97+
98+
fn prepare_for_roots_re_scanning() {
99+
unimplemented!()
100+
}
101+
}

src/vm/tests/mock_tests/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,4 @@ mod mock_test_vm_layout_log_address_space;
6767

6868
mod mock_test_doc_avoid_resolving_allocator;
6969
mod mock_test_doc_mutator_storage;
70+
mod mock_test_doc_weakref_code_example;

0 commit comments

Comments
 (0)