[RFC] Refactor CPUFunction and InterpreterFunction to remove per-run state #2274

nickgg · 2019-01-16T22:16:13Z

Description: To support running a compiled function multiple times, particularly concurrently on different devices, we must remove per-run state from the CompiledFunction.

This is a suggested solution for the CPU and Interpreter backend:

For the CPUFunction: we stored the contiguous buffers for activations and weights and so could only have one concurrent run. These buffers need live only as long as the execution, so have moved into the scope of the execute() method. This means also delaying filling of those buffers until execute() as well.
For the InterpreterFunction: We use various Tensor objects rather than a single weights/activation buffer. These Tensors were stored in the InterpreterFunction itself, which meant concurrent runs would overwrite intermediate values. The sensible place for these is the Context, so I've added the ability to store Tensors keyed by name to the Context and removed the three Tensor maps in InterpreterFunction. This is a general interface, but I'm not sure if it will be useful outside of the interpreter.

The side effect of these changes is to make the non execute() members of CompiledFunction (setupRuns/tearDownRuns and beforeRun/afterRun) empty. The multi-stage execution flow is inherently stateful so I think we should remove them for all backends and move their logic to the various DeviceManagers.

Testing: Unit tests in debug, release & asan.
Documentation: Will need to update, but interested in peoples thoughts.

lib/Backends/CPU/CPUFunction.cpp

opti-mix · 2019-01-17T00:17:52Z

lib/Backends/Interpreter/InterpreterFunction.cpp

Maybe add a comment that this creates an unowned tensor.

lib/Backends/Interpreter/InterpreterFunction.h

lib/Backends/OpenCL/OpenCL.h

include/glow/Backends/CompiledFunction.h

include/glow/Graph/Context.h

lib/Graph/Context.cpp

tests/unittests/CPUDeviceManagerTest.cpp

tests/unittests/GraphTest.cpp

nickgg · 2019-01-18T00:42:41Z

Refactored based on @opti-mix's suggestion.

bertmaher

I think separating the execution state from the CompiledFunction is the right direction. I've one high-level question before I get too deep into the review though: the original intent of "setupRuns" was to prepare the device for execution of a particular model (loading the code/weights, etc.). We don't want to do that stuff on every execute, so where should that happen with this approach?

nickgg · 2019-01-18T21:59:19Z

@bertmaher That device preparation stuff should happen in the DeviceManager, I think. E.g. for the case of moving constants to the device the DeviceManager should do it in the addNetwork() call.

nickgg · 2019-01-18T22:18:19Z

Worth calling out that this PR changes the CompiledFunctioninterface, by adding the Context *argument to `execute(). We'll need to update all backends.

nickgg · 2019-01-22T23:41:14Z

Can I get a review on this? I've got some follow ups piling up.

rdzhabarov · 2019-01-23T03:34:51Z

lib/Backends/CPU/CPUFunction.cpp

this should be fine for now (alloc and dealloc every inference request). But technically we could store this in thread local and reuse buffers.

Yeah, if we run into CPU backend perf concerns we should add a memory pool here.

rdzhabarov · 2019-01-23T03:35:42Z

include/glow/Backends/CompiledFunction.h

magic :) comment about ctx was already in place

rdzhabarov · 2019-01-23T03:38:47Z

lib/Backends/CPU/CPUFunction.h

hm, I'm a bit confused. Placeholders are for inputs/outputs but not for constant tensors.

I'm maintaining existing behaviour of CPUFunction in this diff, which is that all constants & placeholders have their space allocated in the RuntimeBundle and then we copy them into the per-run memory block for execution. The memory should be uninitialized so we don't need to memcpy it, but figured we could fix that when we get to it. It is a known issue with the RuntimeBundle.

rdzhabarov · 2019-01-23T03:39:29Z

lib/Backends/Interpreter/InterpreterFunction.cpp

rdzhabarov · 2019-01-23T03:43:03Z

lib/Backends/OpenCL/OpenCL.cpp

you can just do: execute (Context *) and remove (void)ctx.

heh, @opti-mix has a comment asking for the reverse above. Personally, not worried either way.

ok, does not matter indeed.

rdzhabarov · 2019-01-23T04:35:29Z

lib/Backends/CPU/CPUFunction.cpp

i'd remove comment, does not provide any additional info on top of the var name.

rdzhabarov · 2019-01-23T04:38:26Z

lib/Backends/OpenCL/OpenCL.h

not related to PR but ///@? needs to be closed after void tearDownRuns() override;

nickgg · 2019-01-23T18:00:11Z

ah damn I pushed but didn't add the changes, i'll get em in the next one

rdzhabarov · 2019-01-23T19:18:45Z

ah damn I pushed but didn't add the changes, i'll get em in the next one

sounds good

facebook-github-bot added the CLA Signed label Jan 16, 2019

nickgg force-pushed the compileFunc branch from 53d40a7 to ab4ea80 Compare January 16, 2019 23:03

opti-mix reviewed Jan 17, 2019

View reviewed changes

lib/Backends/CPU/CPUFunction.cpp Outdated Show resolved Hide resolved

opti-mix reviewed Jan 17, 2019

View reviewed changes

lib/Backends/Interpreter/InterpreterFunction.h Outdated Show resolved Hide resolved

opti-mix reviewed Jan 17, 2019

View reviewed changes

lib/Backends/OpenCL/OpenCL.h Outdated Show resolved Hide resolved

rdzhabarov reviewed Jan 17, 2019

View reviewed changes

nickgg force-pushed the compileFunc branch from ab4ea80 to 1c91410 Compare January 18, 2019 00:39

bertmaher reviewed Jan 18, 2019

View reviewed changes

nickgg force-pushed the compileFunc branch from 1c91410 to 9446ffb Compare January 18, 2019 22:16

nickgg force-pushed the compileFunc branch 2 times, most recently from 15cc01c to 2e6a7cd Compare January 22, 2019 21:46

rdzhabarov suggested changes Jan 23, 2019

View reviewed changes

rdzhabarov approved these changes Jan 23, 2019

View reviewed changes

Refactor CompiledFunction to remove per-run state (V2)

cbda3e1

nickgg force-pushed the compileFunc branch from 2e6a7cd to cbda3e1 Compare January 23, 2019 17:46

nickgg merged commit 73479f5 into pytorch:master Jan 23, 2019

nickgg deleted the compileFunc branch January 23, 2019 17:58

nickgg mentioned this pull request Jan 23, 2019

Remove IRFunction requirement from CollectConstants #2290

Merged

[RFC] Refactor CPUFunction and InterpreterFunction to remove per-run state #2274

[RFC] Refactor CPUFunction and InterpreterFunction to remove per-run state #2274

Uh oh!

Conversation

nickgg commented Jan 16, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nickgg commented Jan 18, 2019

Uh oh!

bertmaher left a comment

Choose a reason for hiding this comment

Uh oh!

nickgg commented Jan 18, 2019

Uh oh!

nickgg commented Jan 18, 2019

Uh oh!

nickgg commented Jan 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickgg commented Jan 23, 2019

Uh oh!

rdzhabarov commented Jan 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants