Tried 101 but it gives colisions.
I think 257 is enough now that we dont have thousands of uniforms.
This gives some noticeable performance improvement.
Could be refined further.
The issue was caused by light sample being evaluated to nan at some point.
This is root of the cause which is to be fixed, but is very hard to trace down
especially via ssh (the issue only happens on AVX2 release build). Will give it
a closer look when back to my AVX2 machine.
For until then this is a good check to have anyway, it corresponds to what's
happening in regular radiance sum.
This changes quite a few things:
- Drops the allocation of inputs as a chunk.
- Merge the linked list system into the Gwn_ShaderInput.
- Put name buffer into another memory block, easily resizable.
- Use offset instead of char* to direct to input name.
- Add only requested uniforms dynamicaly to the Shader Interface.
This drops some minor optimisation and use a bit more memory for small shaders (which are fixed count).
But this saves a lot of memory when using UBOs because the names and the Gwn_ShaderInput were alloc'ed for every UBO variable.
This also reduce the Shader Interface initial generation.
The lookup time is left unchanged.
Camera clipping was left to default values, which won't work well for
very large (or small) objects. Now recompute valid clipping start/end
based on boundingbox of rendered data, and final location of camera.
This makes brush influence into a tube instead of a sphere.
It can be used along the outline of a mesh to adjust it's silhouette.
Note that all this takes advantage of changes from vertex paint,
from testing this seems useful so exposing from the brush options.
This is an internal structure, and we don't put it to a list for anything else
that hash collision resolution. No need to have dedicated entry here, saves us
from extra allocation and pointer dereference.
This way we reduce number of loops from look-over-all-inputs to
loop-over-collision, which is expected to be much less CPU ticks.
There is still possible optimization: use memory pool of some sort
to manage memory needed for hash entries, but that will only speedup
shader interface construction / deconstruction time.
There are also some trickery happening to speed up process even more
in the case there is no hash collisions detected when constructing
shader interface.
This behavior makes more sense for sculpt, less so for painting.
Restores non PBVH behavior, adding `BKE_pbvh_find_nearest_to_ray` -
similar to ray-cast except it finds the closest point on the surface.
The work size is still very conservative, and this doesn't help for progressive
refine. For that we will need to render multiple tiles at the same time. But this
should already help for denoising renders that require too much memory with big
tiles, and just generally soften the performance dropoff with small tiles.
Differential Revision: https://developer.blender.org/D2856