Make Hashing Algorithm Faster (but dumber) #4

bertie2 · 2023-07-11T18:58:07Z

Hi just wanted to contribute with the first of a few optimizations I have made.

Please let me know if you actually want this kind of contribution before I raise more of these, namely I will be focusing on more runtime optimizations and parallelism than on the pure math's question of how to sort out the cube rotations.

as such this isn't a pure math optimization, in fact it technically creates more data, but it does more of the work in numpy in C and so is much quicker.

instead of hashing with your RLE, we convert the bits straight into unsigned ints with np.packbits, then we concatenate these ints into a single python int object for the hash.

finally, we add the dimensions of the array to the lowest part of the integer so we don't lose that information.

benchmark attached for n=10 (absolute time is irrelevant as hardware dependent but their is a 2x speedup from the old code)

PS C:\Users\georg\orig_cubes> python.exe .\cubes.py --no-cache 10
Generating polycubes n=3: 100%
Generating polycubes n=4: 100%
Generating polycubes n=5: 100%
Generating polycubes n=6: 100%
Generating polycubes n=7: 100%
Generating polycubes n=8: 100%
Generating polycubes n=9: 100%
Generating polycubes n=10: 100%
Found 346543 unique polycubes
Elapsed time: 450.932s

PS C:\Users\georg\orig_cubes> python.exe .\improved_hash.py --no-cache 10
Generating polycubes n=3: 100%
Generating polycubes n=4: 100%
Generating polycubes n=5: 100%
Generating polycubes n=6: 100%
Generating polycubes n=7: 100%
Generating polycubes n=8: 100%
Generating polycubes n=9: 100%
Generating polycubes n=10: 100%
Found 346543 unique polycubes
Elapsed time: 230.274s

unfortunately since this is speeding up hashing itself not the searches, it will only be a linear speed up so will only push us slightly higher on maximum N.

… in c with numpy

…isscount

bertie2 · 2023-07-11T19:42:50Z

implemented parallelism and threw more cores at the problem, each parallel thread is much slower, but more cores are more cores n=10 benchmark for the parallelism run:

PS C:\Users\georg\orig_cubes> python.exe .\paralel.py --no-cache 10
Generating polycubes n=3: 100%
Generating polycubes n=4: 100%
Generating polycubes n=5: 100%
Generating polycubes n=6: 100%
Generating polycubes n=7: 100%
Generating polycubes n=8: 100%
Generating polycubes n=9: 100%
Generating polycubes n=10: 100%
Found 346543 unique polycubes
Elapsed time: 64.173s

mikepound · 2023-07-11T21:40:10Z

Very nice :) I'm fairly happy with these kinds of optimisations, with only a few caveats right now:

I clearly didn't think through what would happen when I post code on a video and get spammed with pull requests :) Realistically I can keep an eye on this for a few days, but probably not long term
I'd like to keep an original copy of the file either in a branch or alongside any changes for people watching the video for the first time
Any merges are likely to interfere with all the other pull requests and comments, again I've not thought this through really!

bertie2 · 2023-07-11T21:58:04Z

that's fair, depending on how things go (I suspect this should calm down eventually once the obvious ideas have been had and implemented) it might be worth spinning up another repo for people to colab in which is separate to this one linked in the video, appointing some maintainers, and muting it, then you can let the community drive it and we can stop spamming you :)

for now i'm going to keep working on this in my own repo, but wont be posting anything else here except for the npy files to hopefully give you a bit of peace.

mikepound · 2023-07-11T22:04:01Z

This is a great idea! I may do this tomorrow if I have time :)

alistra · 2023-07-12T00:12:39Z

cubes.py

+    data = 0
+    for part in pack_cube:
+        data = (data << 8) + int(part)
+    out = (data * 100000) + (polycube.shape[0] * 1000) + (polycube.shape[1] * 10) + (polycube.shape[2]) # add volume dimensions to lower bits of hash


Doesn't this break on dimensions above 10? for the shape[2] component?

As this can collide with shape[] = 11, 11, 11
generating
xxxxxxxx011121
instead of
xxxxxxxx111111

good point, yes, I was intending for it to be 3 digits per dimension so that there would be plenty of space, since unless theirs a breakthrough somewhere n>99 is completely infeasible, just got my math's very wrong.
in reality though the chance of collision at these very high sizes is vanishingly small in fact there where no collisions for n=12 or n=13, but I will fix in in my next run, and this definitely needs to be fixed before this ever gets merged (if it ever does).

mikepound · 2023-07-12T10:22:38Z

that's fair, depending on how things go (I suspect this should calm down eventually once the obvious ideas have been had and implemented) it might be worth spinning up another repo for people to colab in which is separate to this one linked in the video, appointing some maintainers, and muting it, then you can let the community drive it and we can stop spamming you :)

for now i'm going to keep working on this in my own repo, but wont be posting anything else here except for the npy files to hopefully give you a bit of peace.

I've created a new repo here: https://github.com/mikepound/opencubes

Are you interested in helping maintain this? Github seems to have adapted their UI again - I don't use it much. I think I invite people as collaborators first, then promote to maintainer?

bertie2 · 2023-07-12T11:11:16Z

that's fair, depending on how things go (I suspect this should calm down eventually once the obvious ideas have been had and implemented) it might be worth spinning up another repo for people to colab in which is separate to this one linked in the video, appointing some maintainers, and muting it, then you can let the community drive it and we can stop spamming you :)
for now i'm going to keep working on this in my own repo, but wont be posting anything else here except for the npy files to hopefully give you a bit of peace.

I've created a new repo here: https://github.com/mikepound/opencubes

Are you interested in helping maintain this? Github seems to have adapted their UI again - I don't use it much. I think I invite people as collaborators first, then promote to maintainer?

happy to maintain this, though I fear my usefulness is coming to an end, the scaling memory costs are already becoming a hinderance to my changes at n=14.

bertie2 · 2023-07-12T11:24:06Z

also yes once you have invited someone as a collaborator you should be able to set their permissions, though i'm used to private repos on github business and am not sure if their is some arbitrary limits on open projects.

mikepound · 2023-07-12T15:20:29Z

Oddly it doesn't seem to have any permissions at all, could simply be free vs paid features. I think you have access, so maybe worth pushing something and seeing if it works!

bertie2 · 2023-07-12T15:28:47Z

I can confirm I can push, the real question will come when a pull request or issue comes up, I will see what my options are there.
Thanks.

RibenaJT · 2023-07-12T22:34:37Z

returning the array + dimensions in byte format (as opposed to shifting bits) makes this about 15% or so faster

return polycube.tobytes() + polycube.shape[0].to_bytes(1) + polycube.shape[1].to_bytes(1) + polycube.shape[2].to_bytes(1) # add volume dimensions to lower bits of hash

bertie2 · 2023-07-12T23:11:11Z

returning the array + dimensions in byte format (as opposed to shifting bits) makes this about 15% or so faster

return polycube.tobytes() + polycube.shape[0].to_bytes(1) + polycube.shape[1].to_bytes(1) + polycube.shape[2].to_bytes(1) # add volume dimensions to lower bits of hash

development of this is now continuing over at https://github.com/mikepound/opencubes
but yes your way is about 15% faster (as in in terms of impact on the total runtime which means its a lot faster on its own) and more concise and I will be trying to merge it when the rest of my changes go in.

bertie2 added 3 commits July 11, 2023 19:23

make hashing algorithm faster by doing a direct bit -> int conversion…

7fe374b

… in c with numpy

fix bug where dimensiuons of space wherent included in hash causing m…

1c2f044

…isscount

remove type hinting to conform with rest of codebase

dd05504

georgedorn mentioned this pull request Jul 11, 2023

Reduce number of polycubes subjected to 24x rotations/hashes/lookups #5

Closed

alistra reviewed Jul 12, 2023

View reviewed changes

fix for possible hash collisions in n>10

42fbab9

bertie2 mentioned this pull request Jul 12, 2023

Code Overhaul. mikepound/opencubes#3

Merged

Make Hashing Algorithm Faster (but dumber) #4

Are you sure you want to change the base?

Make Hashing Algorithm Faster (but dumber) #4

Uh oh!

Conversation

bertie2 commented Jul 11, 2023

Uh oh!

bertie2 commented Jul 11, 2023

Uh oh!

mikepound commented Jul 11, 2023

Uh oh!

bertie2 commented Jul 11, 2023

Uh oh!

mikepound commented Jul 11, 2023

Uh oh!

alistra Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

bertie2 Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikepound commented Jul 12, 2023

Uh oh!

bertie2 commented Jul 12, 2023

Uh oh!

bertie2 commented Jul 12, 2023

Uh oh!

mikepound commented Jul 12, 2023

Uh oh!

bertie2 commented Jul 12, 2023

Uh oh!

RibenaJT commented Jul 12, 2023

Uh oh!

bertie2 commented Jul 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bertie2 Jul 12, 2023 •

edited

Loading