Fixed `Module.to()` bugs with `ParameterList` and `ParameterDict`, and with autograd tracking movements between CPU & GPU #1181

shaltielshmid · 2023-12-09T16:31:05Z

I went on a whim and attempted to make a style change on the _toEpilog method. If you see a fatal error or prefer to leave it as it used to be - no problem, I'll revert that change.

Main changes:
1] Merged the three _toEpilog methods into one method.
2] Instead of re-iterating the fields every time and reassigning the parameters, I build a dictionary of the fields and as we go through the registered parameters we check if they have a field, and if so - assign it. Saves us from duplicating the code which handles the moving of the parameters.
3] When moving Parameters, we do it in a "torch.no_grad()" scope to avoid having autograd track the movement and so that the resulting tensor will be a leaf.
4] After moving the parameters and buffers, the old ones are disposed.
5] Overrode the _to() method in the ParameterList and ParameterDict classes. I didn't do it in the ModuleDict and ModuleList methods since modules themselves don't get a new reference, only their parameters and buffers, and _toEpilog() iterates through every registered module.
6] Added Tensor extension method toWillCopy() which given a set of arguments to to() will return whether the tensor will be copied.

Question about the memo properties _deviceType and _deviceIndex. I see the value in having them, but that can cause an issue if someone calls .to() on a sub module, and then tries to call it on the main module() to make sure all the submodules() are aligned, it wont work. Same thing if any of the parameters are moved separately, then calling Module.to() wont behave as people expect.

NiklasGustafsson · 2023-12-11T16:59:59Z

src/TorchSharp/NN/Module.cs

-                                alreadyHandled.Add(p.handle);
-                                break;
-                            }
+                    var fieldsByComponentName = GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance)


Does this have any performance implications?

Under the assumption that the costly operation is the reflection, then there shouldn't be.
We're creating a hash collection of the same size (alreadyHandled vs the Dictionary), so I guess the cost would be a the lookups in the dictionary, which should be very minimal.

The brings up a functional concern -- the alreadyHandled set is there to make sure that we don't accidentally deal with the same tensor twice. Is that no longer a possibility?

I don't think so.
The reason it was needed before was because the function was iterating through two different lists of parameters. One using reflection (the parameters registered through the RegisterComponents() function), and the other is using the internal list of registered parameters.
In the proposed code we only go through the list of registered parameters, so there isn't a concern of dealing with the same tensor twice.
Unless there is a case where someone can register the same tensor twice?

If someone registers the same parameter under two different names, then we have an issue - the first encounter will dispose the parameter, and then the second time it will have a null tensor error.
Should this be a use case we should handle?

NiklasGustafsson · 2023-12-11T17:01:42Z

src/TorchSharp/NN/Module.cs

                    foreach (var (name, buffer) in named_buffers(false).ToList()) {
-                        if (alreadyHandled.Contains(buffer.handle)) continue;
-                        var t = buffer.to(dtype);
+                        if (!buffer.toWillCopy(dtype ?? buffer.dtype, device ?? buffer.device)) continue;


Are old buffers/parameters disposed anywhere?

Yup. In the ".to()" call I use the disposeAfter parameter.

shaltielshmid added 9 commits December 8, 2023 02:10

attempt to rewrite _toEpilog in one function

4d373ed

Added override _to in ParameterDict & List

05d0997

Added unit tests

d909ec2

Fixed test for backwards on cpu

61255bd

Brought back the memo for device

eafb8cc

Added toWillCopy function

81ea7f7

Added release notes

42a7467

Saved release notes

e5c3498

Added <br/> to bug fixes

c0c598f

NiklasGustafsson reviewed Dec 11, 2023

View reviewed changes

NiklasGustafsson approved these changes Dec 11, 2023

View reviewed changes

NiklasGustafsson merged commit 6efd3e9 into dotnet:main Dec 11, 2023

shaltielshmid deleted the fix-module-to-2 branch December 11, 2023 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed `Module.to()` bugs with `ParameterList` and `ParameterDict`, and with autograd tracking movements between CPU & GPU #1181

Fixed `Module.to()` bugs with `ParameterList` and `ParameterDict`, and with autograd tracking movements between CPU & GPU #1181

Uh oh!

shaltielshmid commented Dec 9, 2023 •

edited

Loading

Uh oh!

NiklasGustafsson Dec 11, 2023

Uh oh!

shaltielshmid Dec 11, 2023

Uh oh!

NiklasGustafsson Dec 11, 2023

Uh oh!

shaltielshmid Dec 11, 2023

Uh oh!

shaltielshmid Dec 11, 2023

Uh oh!

NiklasGustafsson Dec 11, 2023

Uh oh!

shaltielshmid Dec 11, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fixed Module.to() bugs with ParameterList and ParameterDict, and with autograd tracking movements between CPU & GPU #1181

Fixed Module.to() bugs with ParameterList and ParameterDict, and with autograd tracking movements between CPU & GPU #1181

Uh oh!

Conversation

shaltielshmid commented Dec 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NiklasGustafsson Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

NiklasGustafsson Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

NiklasGustafsson Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fixed `Module.to()` bugs with `ParameterList` and `ParameterDict`, and with autograd tracking movements between CPU & GPU #1181

Fixed `Module.to()` bugs with `ParameterList` and `ParameterDict`, and with autograd tracking movements between CPU & GPU #1181

shaltielshmid commented Dec 9, 2023 •

edited

Loading

shaltielshmid Dec 11, 2023 •

edited

Loading