Skip to content

Conversation

@andreaskoepf
Copy link
Contributor

! DO NOT MERGE YET !

This PR adds support for three new unsigned Tensor types: UShortTensor, UIntTensor, ULongTensor.
I did a first quick pass over the torch project and added the new types at obvious places. I would be glad if developers who are more familiar with TH an torch could review the changes and let me know what is missing...

Resulting binary sizes

File WITH unsigned tensors original
libtorch.so 2,6M 1,9M
libTH.so 3,4M 2,4M

It can be seen that the binary size increases significantly.

Note: In order to generate files with cwrap for unsigned {short, int, long} I had to make a tiny modification to cwrap.

Example (similar to ByteTensor)

th> a = torch.UShortTensor({1,2,3,4})
th> a - 3
 65534
 65535
     0
     1
[torch.UShortTensor of size 4]

@andreaskoepf
Copy link
Contributor Author

I would agree that unsigned tensor types are not the most important feature for a machine learning library, nevertheless it would simplify processing (e.g. import, export, interfacing 3rd party libs) in certain situations. If the library bloat introduced by the unsigned types is seen as unacceptable high I could define a minimalistic operation set for unsigned tensor types and thereby reduce the binary size of the resulting library while maintaining the ability to convert to and from unsigned types and the execute the basic arithmetic operations...

@eladhoffer
Copy link

Maybe it's better to add this as a separate package, similar to the complex tensor package

@andreaskoepf
Copy link
Contributor Author

@eladhoffer Thanks for your opinion - unsigned tensors as optional add-on package sounds reasonable. I will investigate if this can be done without massive code duplication.

@tkoeppe
Copy link
Contributor

tkoeppe commented Mar 16, 2016

Ahh, now we have three more ways to exercise undefined behaviour when converting from double values :-)

@andreaskoepf
Copy link
Contributor Author

@tkoeppe The starting point for the unsigned-tensor discussion was this Issue of torch-opencv. In most situation the conversion will be UInt Type -> double which can be done even without loss of information in all cases but UInt64. Regarding the other way round you are right: C seems not to define the cast result of values that are out of range of the destination type. As long as this is clear and documented I do not see a big problem here - e.g. if you create a tensor its memory contents is undefined as well. Of course it would be possible to initialize mem to zero or to check that all source elements are within range of destination type and not NaN ... but I think the spirit of torch is to keep things fast and pretty close to what C does.

@tkoeppe
Copy link
Contributor

tkoeppe commented Mar 16, 2016

Maybe. I've seen problems (with uint8 tensors) where some unit test would create a random tensor and increment every element by 1. Boom. It seems to me that users find it pretty hard to think of those implicit preconditions...

@andreaskoepf
Copy link
Contributor Author

If I find time I will go the 'separate package' route.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants