-
Notifications
You must be signed in to change notification settings - Fork 699
[RFC] Refactor CPUFunction and InterpreterFunction to remove per-run state #2274
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that this creates an unowned tensor.
Refactored based on @opti-mix's suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think separating the execution state from the CompiledFunction is the right direction. I've one high-level question before I get too deep into the review though: the original intent of "setupRuns" was to prepare the device for execution of a particular model (loading the code/weights, etc.). We don't want to do that stuff on every execute
, so where should that happen with this approach?
@bertmaher That device preparation stuff should happen in the DeviceManager, I think. E.g. for the case of moving constants to the device the DeviceManager should do it in the |
Worth calling out that this PR changes the |
15cc01c
to
2e6a7cd
Compare
Can I get a review on this? I've got some follow ups piling up. |
lib/Backends/CPU/CPUFunction.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be fine for now (alloc and dealloc every inference request). But technically we could store this in thread local and reuse buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we run into CPU backend perf concerns we should add a memory pool here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic :) comment about ctx was already in place
lib/Backends/CPU/CPUFunction.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I'm a bit confused. Placeholders are for inputs/outputs but not for constant tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm maintaining existing behaviour of CPUFunction in this diff, which is that all constants & placeholders have their space allocated in the RuntimeBundle and then we copy them into the per-run memory block for execution. The memory should be uninitialized so we don't need to memcpy it, but figured we could fix that when we get to it. It is a known issue with the RuntimeBundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used.
lib/Backends/OpenCL/OpenCL.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do: execute (Context *) and remove (void)ctx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, @opti-mix has a comment asking for the reverse above. Personally, not worried either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, does not matter indeed.
lib/Backends/CPU/CPUFunction.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd remove comment, does not provide any additional info on top of the var name.
lib/Backends/OpenCL/OpenCL.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to PR but ///@? needs to be closed after void tearDownRuns() override;
ah damn I pushed but didn't add the changes, i'll get em in the next one |
sounds good |
Description: To support running a compiled function multiple times, particularly concurrently on different devices, we must remove per-run state from the CompiledFunction.
This is a suggested solution for the CPU and Interpreter backend:
The side effect of these changes is to make the non execute() members of CompiledFunction (setupRuns/tearDownRuns and beforeRun/afterRun) empty. The multi-stage execution flow is inherently stateful so I think we should remove them for all backends and move their logic to the various DeviceManagers.
Testing: Unit tests in debug, release & asan.
Documentation: Will need to update, but interested in peoples thoughts.