-
Notifications
You must be signed in to change notification settings - Fork 945
Description
Hi!
first of all, I wanted to say a huge thank you to Peter Shirley for creating this fantastic tutorial series. It is amazing how intuitive and simple things are in his explanations and it is so much fun to follow. It is also fascinating what amazing results you get with so little code. I have been reading the series with great enthusiasm and already learned so much from it!
Also thank you to everyone here on github for all the maintenance work and valuable improvements to the code and the documentation. I can imagine it is hard work to keep everything updated and in sync so it is really greatly appreciated! 👍
The reason why I am writing ... (I actually just signed up at github to try to comment here 😄)
I have followed part 1 and 2 of the series, then spent some time adding a few additional features on my own (such as triangle mesh hittables, a different material etc.). Then my goal was to render the Stanford dragon test model. It is a triangle mesh and has roughly 900k triangles and that is when I suddenly hit a pretty severe bottleneck in the code. It was related to the bvh construction that it was impossible to even start rendering any pixels.
The problem seems that in every recursion step of the bvh construction, we make a copy of the entire vector of hittable pointers, even though we only need to process a handful of objects of those in the actual recursion step. So in my example, the pointers for all 900k triangles are copied in every split which led to the bvh construction taking forever (I cancelled it after a couple of hours).
After taking a closer look at the bvh construction code, I believe there is a way of optimizing the current code but at the same time making it easier to understand for beginners as well. This is why I am writing here. Because I think there is an actual benefit to review the current bvh code and make some changes. It's nothing crazy, only changing a few lines and you gain not only a huge speedup for potential large amounts of hittables but also it is a bit more intuitive as well (in my humble opinion). In the case of my dragon model, the bvh construction was reduced from many hours to ~10 seconds, yay! 😄
The proposed change also eliminates the tracking of the start and end indices and divides the input array before each recursion level. This means that only the really needed objects are passed to the next level. In a true "divide and conquer" fashion. This avoids copying data that is not really needed, the array gets smaller and smaller on each level and the speedup is huge.
As an example, let's say you have 24 objects in your scene. In the current code, all 24 objects get passed down in every step down to the leaf construction level:
In the proposed new code, we would process 24, then split the array, then process 2x12, split, then process 4x6, split, process 8x3 etc.
This is also literally more true to what the bvh actually is - a space partitioning function - not an indexing into an array function. (It would be different if we were indexing into the array for speed purposes but we are not doing this here anyways as we are copying the vector over and over).
Of course this is all not really noticeable with small object counts but for large numbers, this change is huge.
Here is the proposed bvh_node function along with some comments:
// it is no longer necessary to pass indices as arguments
// because in each recursion step, we are processing the entire given src_objects vector.
bvh_node(const std::vector<shared_ptr<hittable>>& src_objects) {
auto objects = src_objects; // Create a modifiable array of the source scene objects
int axis = random_int(0,2);
auto comparator = (axis == 0) ? box_x_compare : (axis == 1) ? box_y_compare : box_z_compare;
// how many objects are in the given array?
size_t size = objects.size ();
if (size == 1) {
left = right = objects[0];
} else if (size == 2) {
if (comparator(objects[0], objects[1])) {
left = objects[0];
right = objects[1];
} else {
left = objects[1];
right = objects[0];
}
} else {
std::sort(objects.begin(), objects.end(), comparator);
// where do we need to split?
auto mid = size/2;
// true "divide and conquer" method: split objects vector into two new vectors
// left: 0 --> mid - 1
// right: mid --> end
auto objectsLeft = std::vector<std::shared_ptr<hittable>>(objects.begin(), objects.begin()+mid);
auto objectsRight = std::vector<std::shared_ptr<hittable>>(objects.begin()+mid, objects.end());
// only pass the necessary objects into next recursion step,
// with each step, the objects vector is getting smaller and therefore faster to process
left = make_shared<bvh_node>(objectsLeft);
right = make_shared<bvh_node>(objectsRight);
}
bbox = aabb(left->bounding_box(), right->bounding_box());
}
As you can see, the function call is simplified and does not need size_t start
and size_t end
arguments anymore. We just hand over the array of objects and let it do its thing:
bvh_node(const std::vector<shared_ptr<hittable>>& src_objects)
We are still copying the input array but since it is getting halved in each recursion level, it is not too big of a deal anymore:
auto objects = src_objects; // Create a modifiable array of the source scene objects
We can eliminate object_span
as well because we are dealing with the whole array. So it just becomes size
:
size_t size = objects.size ();
We don't need start
and end
indices anymore to read from the array...
left = objects[1];
right = objects[0];
...and splitting the array for the next recursion into objectsLeft
and objectsRight
is super easy, thanks to the .begin()
and .end()
iterators:
auto objectsLeft = std::vector<std::shared_ptr<hittable>>(objects.begin(), objects.begin()+mid);
auto objectsRight = std::vector<std::shared_ptr<hittable>>(objects.begin()+mid, objects.end());
Now this is the critical part, we don't call the bvh_node()
function again with the whole objects
array as it is done currently...
left = make_shared<bvh_node>(objects, start, mid);
right = make_shared<bvh_node>(objects, mid, end);
... but instead, each recursion call is only with half of the original objects (objectsLeft
and objectsRight
):
left = make_shared<bvh_node>(objectsLeft);
right = make_shared<bvh_node>(objectsRight);
In the next recursion level we will then operate on the full objectsLeft
and objectsRight
, split each of them in half, ... and so on.
In my humble opinion, this is super easy to understand, in fact even easier than the original function and keeping track of indices.
So you could argue that the tutorial code is not meant for rendering thousands of objects anyways and there is no real speed gain for the scenes here. But I still believe that this change is worth looking at because:
- The code becomes overall simpler
- For people who actually do decide to take it further, they won't hit this road block and they could focus on more "fun" things like new shapes / materials / textures / lights etc.. 😄
If there is any interest from your side in reviewing this, please let me know! From my side, it's ready to go!
Thank you for your consideration and happy rendering everyone!
P.S.: And here is my dragon, thanks to you guys! 😄