Skip to content

Conversation

@ysbaddaden
Copy link
Collaborator

No description provided.

@ysbaddaden ysbaddaden self-assigned this Feb 5, 2024
@beta-ziliani
Copy link
Member

beta-ziliani commented Feb 5, 2024

@straight-shoota
Copy link
Member

The distinction between execution context and scheduler could need a bit refinement. There is definitely some overlap in functionality, just by comparing the API. I guess execution contexts might take over some features of the current scheduler?

Copy link
Member

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely lovely, well-written proposal. I completely agree with the design intent here, and only have a few—mostly overlapping—comments about event loops and the default context. The vast majority of this design is exactly what I would like to see in crystal.


## Default context configuration

This proposal doesn’t solve the inherent problem of: how can applications configure the default context at runtime (e.g. number of MT schedulers) since we create the context before the application’s main can start.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal supports creating multiple execution contexts, let the application configure it's own EC and start fibers in it if required. That allows all the complexity the app needs when configuring the context the application actually runs in, because it's initialized by the application. The root context does not have to be well-used by the application.

@straight-shoota straight-shoota changed the title RFC 0002 - MT Execution Contexts RFC 0002: MT Execution Contexts Feb 8, 2024
@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Feb 13, 2024

@RX14 Tell me if I'm wrong, the plan would be:

Crystal 1

  • introduce EC with ST and MT;
  • deprecate same_thread argument;
  • same_thread: false is NOOP;
  • ST accepts same_thread: true (always true anyway);
  • MT raises on same_thread: true (new API, no breaking change);
  • default EC is ST (no breaking change);
  • consider a -Dmt flag to force default EC to be MT (?);

Crystal 2

  • remove deprecated same_thread (breaking change);
  • default EC becomes MT (breaking change).

Isolated context

I think I see that context for UI loops only, and want to prevent blocking behaviors, but there's nothing wrong with doing blocking calls in other use cases, and using the event-loop normally is fine. Still, spawning a fiber without an explicit context should either raise or the default context should be configured (as you suggest):

abstract class ExecutionContext
  class Isolated < ExecutionContext
    def initialize(name : String, @spawn_context : ExecutionContext? = nil, &)
      @thread = Thread.new(name) { yield }
    end

    def spawn(**args, &) : Fiber
      if ctx = @spawn_context
        ctx.spawn(**args) { yield }
      else
        raise RuntimeError.new("Can't spawn in isolated context (need a spawn context)")
      end
    end
  end
end

mt = ExecutionContext::MultiThreaded.new
ui = ExecutionContext::Isolated.new("GTK", spawn_context: mt) { Gtk.main }

Instead of raising, the spawn context could be the default EC.

@ysbaddaden
Copy link
Collaborator Author

@RX14 I applied your suggestions to the RFC.

There's no such method
@RX14
Copy link
Member

RX14 commented Feb 14, 2024

@ysbaddaden I think -Dmt is probably not necessary, and the exact implementation plan for crystal 2 is best left deferred until there's operational experience, but I agree on everything else.

I envision the root execution context being MT or ST a moot point, because every well-architected app has a single App.run line at the top-level and converting that top-level code to be spawning a MT context and waiting for that fiber should be a one-liner if we have the right helper methods in place.

If we all agree, maybe we can start on the other 90% of the RFC: bikeshedding naming. I like ExecutionContext::Parallel, because I don't like the idea of implementation details (threads) leaking into the name.

@ysbaddaden
Copy link
Collaborator Author

@RX14 The mt flag may not be necessary in Crystal v1, as the default context could be MT:1 and resized on demand (still no breaking change). I'll still push for MT:N to be the default in Crystal v2. Execution contexts are a mean to further control the parallelism in very specific cases, not the end solution. I believe developers shouldn't have to care about it until you have to.

I wouldn't bikeshed the namings just yet. As I'm experimenting with the types, I feel that the difference is getting thinner and thinner. In fact, Kotlin only has a single scheduler implementation, and a couple constructors to start execution contexts with 1 (ST) or many threads (MT).

I'm also struggling with the inheritance: EC::MT < EC makes sense, but so does EC::MT::Scheduler < EC as we want EC.current to point to the current MT scheduler running on the thread, not the shared MT context (it's easier to reach the context from the scheduler).

@crysbot
Copy link

crysbot commented Feb 21, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/crystal-multithreading-support/6622/8

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Feb 23, 2024

I forgot again but MT:1 would break spawn(same_thread: true) in Crystal 1. It's a NOOP without the preview_mt flag but the parameter was still exposed to the public API 😭

@straight-shoota
Copy link
Member

straight-shoota commented Feb 23, 2024

I think we can accept breakage with same_thread: true. It only works with preview_mt which is explicitly a preview feature. There should be no expectations on compatibility in a setting outside of preview_mt.

Copy link
Contributor

@yxhuvud yxhuvud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, I like this a lot.


Such a group of fibers will never run in parallel. This can vastly simplify the synchronization logic since you don’t have to deal with parallelism anymore, only concurrency, which is much easier & faster to deal with. For example no need for costly atomic operations, you can simply access a value directly. Parallelism issues and their impact on the application performance is limited to the global communication.

## Issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think data and especially execution locality could show up on the negative side as well, as the round robin take away a lot of programmatic control of data locality as well. It is .. possible.. to manually schedule fibers to dedicated threads but it really is not the way it currently is meant to be used.

- a scheduler to run the fibers (or many schedulers for a MT context);
- an event loop (IO & timers):

=> this might be complex: I don’t think we can share a libevent across event bases? we already need to have a “thread local” libevent object for IO objects as well as for PCRE2 (though this is an optimization).
Copy link
Contributor

@yxhuvud yxhuvud Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the 'let the event loop decide if it want to be instantiated on a thread, execution level or global level' camp. How that would look API-wise I'm less sure - especially not if dynamic amount of threads in a context is to be supported.

This might be complex: I don’t think we can share a libevent across event bases

From what I have gathered from libevent docs, it is possible, but would necessitate a lot more synchronization when io happens (*), so it is probably slower.

But yes, it is complex. Windows, and its weird file handles says hi. Each open file handle is specific for each instance of whatever it uses, so it needs to be only one global one event instance there.

  • we already enable some structures for thread safety but then create separate bases for each thread anyhow, IIRC. It was quite a while since I looked at it. I think we can remove that enabling without danger - they should really only be used when actually reusing a libevent base between threads). We don't use the specialized mt safe functions libevent that make use of it.

Comment on lines 355 to 357
def initialize(@name : String, @minimum : Int32, @maximum : Int32)
# todo: start @minimum threads
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing a dynamic amount of threads requires more synchronization and complexity than having a static amount. While it sounds nice to be able to adjust, it probably warrants its own separate class. Making certain all threads are in a waiting state before starting to actually queue stuff allows a bunch of simplifications with less mutexes and risks a lot fewer possible race conditions too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enqueue doesn't have to need much sync. Go pushes to a bounded local queue (per scheduler) with overflow to a global queue; threads can be started at any time: when they reach the run loop they will grab a batch of fibers from the global queue or steal from another scheduler. Stopping ain't more complex, schedulers aren't tied to a specific thread, the thread detaches the scheduler and returns itself to the thread pool.

The complexity is more in when to start / stop a thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that complexity pushes for it to become a "future evolution".

@Blacksmoke16
Copy link
Member

As someone who isn't familiar with this stuff at all, my random question is:

Do we need to do anything in relation to like how Intel has P and E cores now? Like as a way to signal to the OS's thread scheduler that a fiber should have a preference on where it runs? Or is that something the OS itself handles somehow?

@ysbaddaden
Copy link
Collaborator Author

@Blacksmoke16 From what I read specifying a thread priority can hint the OS to schedule the thread on a big (efficient) or little (power) core. We can also set a thread affinity to a given core, but we must detect the core type beforehand.

Adds notes about wrapping an existing EC, and thread affinities
(to pin a thread to a core) in addition to set priorities (still no API).

Simplifies the EC API to remove `yield` and `sleep` that may not
be needed (the `Fiber.yield` and `sleep` methods can create the
resume events), but adds `spawn(same_thread)` to handle the
transition.
straight-shoota pushed a commit to crystal-lang/crystal that referenced this pull request Dec 6, 2024
Upgrades the IOCP event loop for Windows to be on par with the Polling event loops (epoll, kqueue) on UNIX. After a few low hanging fruits (enqueue multiple fibers on each call, for example) the last commit completely rewrites the `#run` method:

- store events in pairing heaps;
- high resolution timers (`CreateWaitableTimer`);
- block forever/never (no need for timeout);
- cancelling timeouts (no more dead fibers);
- thread safety (parallel timer de/enqueues) for [RFC #2];
- interrupt run using completion key instead of an UserAPC for [RFC #2] (untested).

[RFC #2]: crystal-lang/rfcs#2
@crysbot
Copy link

crysbot commented Dec 17, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/upcoming-release-1-15-0/7537/1

straight-shoota pushed a commit to crystal-lang/crystal that referenced this pull request Dec 25, 2024
In a MT environment such as proposed in crystal-lang/rfcs#2, the main thread's fiber may be resumed by any thread, and it may return which would terminate the program... but it might return from _another thread_ that the process' main thread, which may be unexpected by the OS.

This patch instead explicitly exits from `main` and `wmain`.

For backward compatibility reasons (win32 `wmain` and wasi `__main_argc_argv` both call `main` andand are documented to do so), the default `main` still returns, but is being replaced for UNIX targets by one that exits.

Maybe the OS actual entrypoint could merely call `Crystal.main` instead of `main` and explicitely exit (there wouldn't be a global `main` except for `UNIX`), but this is out of scope for this PR.
straight-shoota added a commit to crystal-lang/crystal that referenced this pull request Feb 22, 2025
Integrates the skeleton as per crystal-lang/rfcs#2

- Add the `ExecutionContext` module;
- Add the `ExecutionContext::Scheduler` module;
- Add the `execution_context` compile-time flag.

When the `execution_context` flag is set:

- Don't load `Crystal::Scheduler`;
- Plug `ExecutionContext` instead of `Crystal::Scheduler` in `spawn`, `Fiber`, ...

This is only the skeleton: there are no implementations (yet). Trying to compile anything with `-Dexecution_context` will fail until the ST and/or MT context are implemented.

Co-authored-by: Johannes Müller <[email protected]>
kojix2 pushed a commit to kojix2/crystal that referenced this pull request Feb 23, 2025
Integrates the skeleton as per crystal-lang/rfcs#2

- Add the `ExecutionContext` module;
- Add the `ExecutionContext::Scheduler` module;
- Add the `execution_context` compile-time flag.

When the `execution_context` flag is set:

- Don't load `Crystal::Scheduler`;
- Plug `ExecutionContext` instead of `Crystal::Scheduler` in `spawn`, `Fiber`, ...

This is only the skeleton: there are no implementations (yet). Trying to compile anything with `-Dexecution_context` will fail until the ST and/or MT context are implemented.

Co-authored-by: Johannes Müller <[email protected]>
@ysbaddaden
Copy link
Collaborator Author

The current constructors are rather complex and inconvenient to type, and pushing ExecutionContext under the Fiber namespace didn't help. I mean:

ctx = Fiber::ExecutionContext::Isolated.new(name) { }
ctx = Fiber::ExecutionContext::SingleThreaded.new(name)
ctx = Fiber::ExecutionContext::MultiThreaded.new(name, size: 1..4)

I'm wondering if we could have simpler helpers. For example (thinking out loud, don't take things for granted):

ctx = Fiber.start_isolated(name) { } # or #spawn_isolated ?
ctx = Fiber.start_concurrent_context(name)
ctx = Fiber.start_parallel_context(name, size: 1..4)

I'm not sure it's really better. Maybe a bit more explicit thanks to the start_ prefix?

The naming might also indicate that maybe the ST and MT classes should be named Concurrent and Parallel of SingleThreaded and MultiThreaded?

@ysbaddaden ysbaddaden marked this pull request as ready for review February 26, 2025 09:24
@crysbot
Copy link

crysbot commented Mar 25, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/upcoming-release-1-16-0/7883/1

straight-shoota pushed a commit to crystal-lang/crystal that referenced this pull request Jul 2, 2025
The `@[ThreadLocal]` annotation only works on some targets and doesn't allow registering a destructor callback that will be invoked when a thread shuts down.

We currently don't have threads shutting down, but with [RFC 2] it will start happening (at least isolated contexts are expected to shut down, others should eventually evolve to shut down too), or use the complex `Crystal::ThreadLocalValue` to tie a value to a thread, which in turn requires finalize methods

[RFC 2]: crystal-lang/rfcs#2
@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Jul 22, 2025

I updated the phrasing and names of the proposed execution contexts.

Reading the RFC again, I believe that its purpose, to propose a new API to achieve parallelism in Crystal, can be considered done, even though the feature isn't fully released.

I propose to archive RFC 0002 and to start a new one with details on the actual execution contexts (Concurrent, Parallel, Isolated), what guarantees they bring and how they differ, and what still needs to be done: resize, shutdown, thread pool and detach on syscall, mostly.

Comment on lines +412 to +414
In practice, the concurrent context might not bring much performance improvement
over the parallel context with a single thread, and both might share the same
base.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm still not sure if it's even worth having a separate concurrent context when there's not much practical benefit over a parallel context with max=1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, while ST is fixed to at most 1 thread at any time, MT:1 could grow to MT:N under the hood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not on its own, only if you rescale the context. Or am I missing something?

Copy link
Collaborator Author

@ysbaddaden ysbaddaden Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be an explicit call, sure, but it can happen, and a compilation error, or at least a runtime exception, is better than running into segfaults. That doesn't mean that a Concurrent contexts needs a specific implementation, but that it needs to make sure that the context can NEVER be resized.

This guarantee alone seems enough to warrant the existence of the context.

@straight-shoota straight-shoota merged commit 481566b into main Aug 1, 2025
@straight-shoota straight-shoota deleted the rfc-0002-mt-execution-contexts branch August 1, 2025 08:51
@github-project-automation github-project-automation bot moved this from In Progress to Done in Multi-threading Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

9 participants