Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
The RenderResult struct still has a listbase of RenderLayer, but that's ok
since this is strictly for rendering.
* Subversion bump (to 2.80.2)
* DNA low level doversion (renames) - only for .blend created since 2.80 started
Note: We can't use DNA_struct_elem_find or get file version in init_structDNA,
so we are manually iterating over the array of the SDNA elements instead.
Note 2: This doversion change with renames can be reverted in a few months. But
so far it's required for 2.8 files created between October 2016 and now.
Reviewers: campbellbarton, sergey
Differential Revision: https://developer.blender.org/D2927
This patch moves all the functionality previously in SceneRenderLayer to SceneLayer.
If we want to rename some of these structs now would be a good time to do it, before they are in SceneLayer.
Everything should be working, though I will test things further tomorrow. Once this is committed depsgraph can get
rid of the workaround added in rna_Main_meshes_new_from_object and finish whatever this patch was preventing from being finished.
This patch also adds a few placeholders for the overrides (samples, ...). These are obviously not working, so some unittests that rely on 'lay', and 'zmask' will fail.
This patch does not addressed the change of moving samples to ViewRender (I have this as a separate patch and needs some separate discussion).
Following next is the individual note of the individual parts that were committed.
Note 1: It is up to Cycles to still get rid of exclude_layer internally.
Note 2: Cycles still need to handle its own doversion for the use_layer_samples cases and
(1) Remove the override as it is
(2) Add a new override (scene.cycles.samples) if scene.cycles.use_layer_samples != IGNORE
Respecting the expected behaviour when scene.cycles.use_layer_samples == BOUNDED.
Note 3: Cycles still need to implement the per-object holdout
(similar to how we do shadow catcher).
Note 4: There are parts of the old (Blender Internal) rendering pipeline that is still
using lay, e.g., in shi->lay.
Honestly it will be easier to purge the entire Blender Internal code away instead of taking things from it bit by bit.
Reviewers: sergey, campbellbarton, brecht
Differential Revision: https://developer.blender.org/D2919
There are parts of the old (Blender Internal) rendering pipeline that is still
using lay, e.g., in shi->lay.
Honestly it will be easier to purge the entire Blender Internal code away instead
of taking things from it bit by bit.
Note: Cycles still need to handle its own doversion for theses cases and
(1) Remove the override as it is
(2) Add a new override (scene.cycles.samples) if scene.cycles.use_layer_samples != IGNORE
Respecting the expected behaviour when scene.cycles.use_layer_samples == BOUNDED.
This is tricky since we may want granular polling depending on the setting.
Or an option to pick whether we want the context or the scene to drive the
panels to prevent too many panels when mixing Eevee and Cycles for example.
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
Was giving difference when using sharpness of 1.0 and 0.999 even though the
result was expected to be really close to each other.
This SSS profile will probably be removed in the future in favor of more
physically bases Burley, but for the time being don't see anything wrong
fixing an existing code.
No color pass because it's hard to define what to use as color in a volume.
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D2903
random_id() crashes when there is no current dupli object.
We could also throw a Python error when doing it via RNA, but as far as
Cycles is concerned we need to check if instanced.
There was some changes about namespaces, which causes ambiguities.
Replaces using namespace with an explicit symbols we need. Is good idea to NOT
pull in the whole namespace anyway!
Previously we picked one of the RGB channels with equal probability, but this
works poorly in a dense volume after many bounces. Now we take into account
the throughput and single scattering albedo.
This makes it a little more practical to do brute force SSS with volumes, but
is still very inefficient because we do direct light sampling at every volume
bounce even when inside an opaque mesh. In theory there could be a light inside
the mesh so we can't automatically disable direct lighting.
In fact this was an existing issue when exceeding the number of available
closure, but it's more common now that we set the number to 0 for shadows
and emission