Only copy required part of volume stack instead of entire stack.
Solves time regression introduced by D12759 and avoids need in
implementing volume stack calculation to exactly match what the
path tracing will do (as well as potentially makes scenes with
a lot of volumes ans a tiny bit of deeply nested ones render
faster).
Still need to look into memory aspect of the regression, but
that is for separate patch.
Ref T92014
Maniphest Tasks: T92014
Differential Revision: https://developer.blender.org/D12790
Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.
Now decouple triangle and curve primitive storage from BVH2.
* Reduced peak memory usage on all devices
* Bit better performance for OptiX and Embree
* Bit worse performance for CUDA
* Simplified code:
** Intersection.prim/object now matches ShaderData.prim/object
** No more offset manipulation for mesh displacement before a BVH is built
** Remove primitive packing code and flags for Embree and OptiX
** Curve segments are now stored in a KernelCurve struct
* Also happens to fix a bug in baking with incorrect prim/object
Fixes T91968, T91770, T91902
Differential Revision: https://developer.blender.org/D12766
MSVC does not support variable size array definition.
Use maximum possible stack, similar to the GPU case.
Not expected to have user-measurable difference.
Make volume stack allocated conditionally, potentially based on the
actual nested level of objects in the scene.
Currently the nested level is estimated by number of volume objects.
This is a non-expensive check which is probably enough in practice
to get almost perfect memory usage and performance.
The conditional allocation is a bit tricky.
For the CPU we declare and define maximum possible volume stack,
because there are only that many integrator states on the CPU.
On the GPU we declare outer SoA to have all volume stack elements,
but only allocate actually needed ones. The actually used volume
stack size is passed as a pre-processor, which seems to be easiest
and fastest for the GPU state copy.
There seems to be no speed regression in the demo files on RTX6000.
Note that scenes with high nested level of volume will now be slower
but correct.
Differential Revision: https://developer.blender.org/D12759
Fixes:{T91064}
Caused by {rBcd118c5581f482afc8554ff88b5b6f3b552b1682}
- Applies `ensure_valid_reflection()` to the normal input on all BSDFs for CPU and GPU.
- This doesn't affect hair.
- Removes `ensure_valid_reflection()` from the output of Bump Map and Normal Map nodes for CPU/GPU as it is not needed.
- The fix doesn't touch OSL.
Reviewed By: brecht, leesonw
Maniphest Tasks: T91064
Differential Revision: https://developer.blender.org/D12403
This effectively undoes some of the following commit:
rB4537e8558468c71a03bf53f59c60f888b3412de2
The tables in question were duplicated 5-6 times into the blender
executable due to the headers being used in multiple translation units.
This contributes ~6.3kb worth of duplicate data into the binary.
Some further details are in the below revision.
Differential Revision: https://developer.blender.org/D12724
Implement an overscan support for tiles, so that adaptive sampling can
rely on the pixels neighbourhood.
Differential Revision: https://developer.blender.org/D12599
Always sample background pass behind shadow catcher (if the pass
exists, of course), regardless of whether shadow catcher will be
used as approximate or accurate.
Allows to combine accurate shadows into an environment map.
Differential Revision: https://developer.blender.org/D12747
Similar to the previous change in the area: need to avoid ray
point and direction becoming a non-finite value.
Use the view direction when the geometrical normal can not be
calculated.
Collaboration and sanity inspiration with Brecht!
Differential Revision: https://developer.blender.org/D12703
The issue was caused by hair shader setup setting normal to a non
finite value, which then gets used to create a ray with non-finite
direction, making BVH traversal to run out of stack memory.
Happens with 150_0040_A.lighting.blend frame 112 of the Sprites
project.
Differential Revision: https://developer.blender.org/D12692
Avoids possible numerical issues in the path tracing kernel, which
is most important for displacement as non-finite values in BVH can
lead to infinite node recursion during traversal.
Differential Revision: https://developer.blender.org/D12690
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.
HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.
https://github.com/ROCm-Developer-Tools/HIP.
As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.
See task T91571 for more details on the current status and work remaining
to be done.
Credits:
Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)
Differential Revision: https://developer.blender.org/D12578
Before the visibility test against the visibility flags was performed in an any-hit program in OptiX
(called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array.
This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility
flags of all objects that reference that primitive, so if one object uses different visibility flags
than another object, but they both are instances of the same geometry, they would appear the same
way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is
that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles
visibility flags (12 bits).
To mostly fix the problem with multiple instances and different visibility flags, I changed things to
use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA`
to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit
program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in
curves from OptiX, since the any-hit program is then also used to skip the curve endcaps).
This may also improve performance in some cases, since by default OptiX can now perform the normal
scene intersection trace calls entirely on RT cores without having to jump back to the SM on every
hit to execute the any-hit program.
Fixes T89801
Differential Revision: https://developer.blender.org/D12604
Fix T91632: Stops the sample correlation between dimensions which was causing rendering artefacts on simple scenes.
This is done by increasing the amount of jitter the Cranley Patterson Rotation is allowed to add. Also, it uses the y dimension of the of the sample table for 1D sampling which causes further decorrelation between dimensions. As an additional measure the x and y dimensions are swapped randomly to provide further decorrelation.
Maniphest Tasks: T91632
Differential Revision: https://developer.blender.org/D12610
Goal is to add the length attribute to the Hair Info node, for better control over color gradients or similar along the hair.
Reviewed By: #eevee_viewport, brecht
Differential Revision: https://developer.blender.org/D10481
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycleshttps://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
Prior to rBb8ecdbcd964a normals were stored both in
DeviceScene.tri_vnormal and the float3 attributes buffer. However, the
normals in `DeviceScene.tri_vnormal` might have be transformed to world
space if the object's transformation was applied, while the data in the
float3 attributes buffer were not. This caused shading issues in cases
where the objects did have transformation applied, as the math expects
the normals to be in object space.
To fix this, convert the normals to object space if necessary before
applying the normal map.
Reviewed By: brecht
Maniphest Tasks: T90854
Differential Revision: https://developer.blender.org/D12294
Solves an error in the principled diffuse BSDF, where it was not correctly
rejecting directions outside the hemisphere.
Differential Revision: https://developer.blender.org/D12283
This modifies the attribute lookup to use object coordinates if no
generated coordinates are found on the geometry.
This is useful to avoid creating and copying this attribute, thus saving
a bit of time and memory.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D12238
Vertex normals are needed for normals maps and therefore are packed and send
to the device alongside the other float3 attributes. However, we already pack
and send vertex normals through `DeviceScene.tri_vnormal`.
This removes the packing of vertex normals from the attributes buffer, and
reuses `tri_vnormal` in the kernel for normals lookup for normal maps, which
reduces memory usage a bit, and speeds up device updates.
This also fixes potential missing normals updates following rB12a06292af86,
since the need for vertex normals for normals maps was overlooked.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D12237
The goal: allow to easily use AO approximation in scenes which combines
both small and large scale objects.
The idea: use per-object AO distance which will allow to override world
settings. Instancer object will "propagate" its AO distance to all its
instances unless the instance defines own distance (this allows to
modify AO distance in the shot files, without requiring to modify props
used in the shots.
Available from the new Fats GI Approximation panel in object properties.
Differential Revision: https://developer.blender.org/D12112
WITH_CYCLES_DEBUG was used for rendering BVH debugging passes. But since we
mainly use Embree an OptiX now, this information is no longer important.
WITH_CYCLES_DEBUG_NAN will enable additional checks for NaNs and invalid values
in the kernel, for Cycles developers. Previously these asserts where enabled in
all debug builds, but this is too likely to crash Blender in scenes that render
fine regardless of the NaNs. So this is behind a CMake option now.
Fixes T90240