Skip to content
This repository was archived by the owner on Jul 1, 2025. It is now read-only.

Conversation

nickgg
Copy link
Contributor

@nickgg nickgg commented Jan 16, 2019

Description: To support running a compiled function multiple times, particularly concurrently on different devices, we must remove per-run state from the CompiledFunction.

This is a suggested solution for the CPU and Interpreter backend:

  • For the CPUFunction: we stored the contiguous buffers for activations and weights and so could only have one concurrent run. These buffers need live only as long as the execution, so have moved into the scope of the execute() method. This means also delaying filling of those buffers until execute() as well.
  • For the InterpreterFunction: We use various Tensor objects rather than a single weights/activation buffer. These Tensors were stored in the InterpreterFunction itself, which meant concurrent runs would overwrite intermediate values. The sensible place for these is the Context, so I've added the ability to store Tensors keyed by name to the Context and removed the three Tensor maps in InterpreterFunction. This is a general interface, but I'm not sure if it will be useful outside of the interpreter.

The side effect of these changes is to make the non execute() members of CompiledFunction (setupRuns/tearDownRuns and beforeRun/afterRun) empty. The multi-stage execution flow is inherently stateful so I think we should remove them for all backends and move their logic to the various DeviceManagers.

Testing: Unit tests in debug, release & asan.
Documentation: Will need to update, but interested in peoples thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that this creates an unowned tensor.

@nickgg
Copy link
Contributor Author

nickgg commented Jan 18, 2019

Refactored based on @opti-mix's suggestion.

Copy link
Contributor

@bertmaher bertmaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think separating the execution state from the CompiledFunction is the right direction. I've one high-level question before I get too deep into the review though: the original intent of "setupRuns" was to prepare the device for execution of a particular model (loading the code/weights, etc.). We don't want to do that stuff on every execute, so where should that happen with this approach?

@nickgg
Copy link
Contributor Author

nickgg commented Jan 18, 2019

@bertmaher That device preparation stuff should happen in the DeviceManager, I think. E.g. for the case of moving constants to the device the DeviceManager should do it in the addNetwork() call.

@nickgg
Copy link
Contributor Author

nickgg commented Jan 18, 2019

Worth calling out that this PR changes the CompiledFunctioninterface, by adding the Context *argument to `execute(). We'll need to update all backends.

@nickgg nickgg force-pushed the compileFunc branch 2 times, most recently from 15cc01c to 2e6a7cd Compare January 22, 2019 21:46
@nickgg
Copy link
Contributor Author

nickgg commented Jan 22, 2019

Can I get a review on this? I've got some follow ups piling up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fine for now (alloc and dealloc every inference request). But technically we could store this in thread local and reuse buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we run into CPU backend perf concerns we should add a memory pool here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic :) comment about ctx was already in place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I'm a bit confused. Placeholders are for inputs/outputs but not for constant tensors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm maintaining existing behaviour of CPUFunction in this diff, which is that all constants & placeholders have their space allocated in the RuntimeBundle and then we copy them into the per-run memory block for execution. The memory should be uninitialized so we don't need to memcpy it, but figured we could fix that when we get to it. It is a known issue with the RuntimeBundle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do: execute (Context *) and remove (void)ctx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, @opti-mix has a comment asking for the reverse above. Personally, not worried either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, does not matter indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd remove comment, does not provide any additional info on top of the var name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to PR but ///@? needs to be closed after void tearDownRuns() override;

@nickgg nickgg merged commit 73479f5 into pytorch:master Jan 23, 2019
@nickgg nickgg deleted the compileFunc branch January 23, 2019 17:58
@nickgg
Copy link
Contributor Author

nickgg commented Jan 23, 2019

ah damn I pushed but didn't add the changes, i'll get em in the next one

@rdzhabarov
Copy link
Contributor

ah damn I pushed but didn't add the changes, i'll get em in the next one

sounds good

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants