Was happening during rendering, causing visual artifacts when doing
CPU+GPU rendering, and giving different in-progress results on different
devices.
The root of the issue comes to the fact that math used in the approximate
shadow catcher calculation might have resulted in negative alpha channel,
and negative values for display are handled differently on CPU and GPU.
Such difference in handling is caused by an approximate conversion used on
the CPU for the performance reasons.
This change makes it so no negative alpha is generated by the approximate
shadow catcher. Not sure if we need some explicit clamping somewhere to
deal with possible negative values coming from somewhere else.
The shadow catcher cornell box tests are to be updated for the new code,
but the new result seems to be more accurate.
Differential Revision: https://developer.blender.org/D13354
Noticed when was looking into T93155. Steps to reproduce:
- Open the .blend file from the report
- Hit F12 to start rendering
- After some tiles were rendered hit Esc
The issue is caused by "sticky" cancel reported via Progress. This means
that once user hit Esc all further requests for cancel state will return
truth, which was preventing OIDN denoiser from completing the denoising
task.
Now only allow stopping the denoiser when interactive rendering requests
a very fast stopping.
Aiming the fix for 3.0 branch.
Differential Revision: https://developer.blender.org/D13352
With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`.
This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others.
Most of this code is based on proposals from @brecht and @leesonw
Reviewed By: brecht, leesonw
Differential Revision: https://developer.blender.org/D13042
21.Q4 is required, older version should not show devices in the preferences.
This adds a check for the file version of amdhip64.dll file during hipew
initialization.
Differential Revision: https://developer.blender.org/D13324
Use the correct device function (hipDeviceGet) for multi GPU setups, instead
of hipGetDevice which just returns the default device.
Differential Revision: https://developer.blender.org/D13323
Fix problem with duplicated initial character when initiating or
switching to new windows. This is done by updating our copies of state
and modes from the new window when it receives WM_IME_SETCONTEXT
message. This problem and fix are only for the Windows platform.
* Rename "Auto Tiles" to "Use Tiling", it's not really automatic and
confusing with the old auto tile size add-on.
* Rename "Adaptive" scrambling distance to "Automatic", to avoid confusion
with adaptive sampling.
Happens when device runs out of memory and Cycles is moving some
textures to the host memory.
The delayed memory free for OptiX BVH was moving data from one
device_memory to another, leaving the original device memory in
an invalid state. This was ruining the allocation map in the CUDA
device which is using pointer to the device_memory.
This change makes it so the memory pointer is stolen from BVH
into the delayed memory free list.
Additionally, forbid copying and moving instances of device_memory
and added sanity checks in the device implementation.
Differential Revision: https://developer.blender.org/D13316
This fixes the the app crash happening when trying to render smoke as a dense
3D texture. The changes are related to matching up hipew with the actual HIP
headers.
Differential Revision: https://developer.blender.org/D13296
With very long ray distance, OptiX ends up traversing many BVH nodes due to
a feature that improves precision. However this causes very slow rendering.
We now avoid generating such long rays by rejecting the few samples that have
long ray distances and very low probability of being generated. This should not
meaningfully affect render results.
Thanks to Sergey and Patrick for the investigation.
Previously the check was done based on dimension of image and if any
of dimensions were larger than tile size tiling was used.
This change makes it so that if image does not exceed number of pixels
in the tile no tile will be used. Allows to render widescreen images
without tiling.
Differential Revision: https://developer.blender.org/D13206
The calculation based on preserving device occupancy was conflicting
with the fact that time limit needs to render less samples at the last
round of render work.
For example, rendering BMW27 for 30sec on i9-11900k was actually
rendering for almost a minute. Now the render time limit is respected
much more close.
Differential Revision: https://developer.blender.org/D13269
Build HIP kernels with NanoVDB, and patch NanoVDB to work with HIP.
This is a header only library so no rebuild is needed. The changes are being
submitted upstream to openvdb, so this patch should be temporary.
Thanks Thomas for help testing this.
stack_assign_if was used in the middle of creating the shader value blocks.
Which caused stack variables to be inserted in the middle of the shader value data.
This resulted in the shader node data no being in sequential order. This was also
the case for the wave texture wave node.
Reviewed By: brecht
Maniphest Tasks: T93102
Differential Revision: https://developer.blender.org/D13262
We need to increase GPU memory usage a bit. Unfortunately we can't get away
with writing either reflection or transmission passes because these BSDFs may
scatter in either direction but still must be in a fixed reflection or
transmission category to match up with the color passes.
Partially reverts commit rB440a3475b8f5410e5c41bfbed5ce82771b41356f because
"optixDenoiserComputeIntensity" does not currently support input images that are not packed (the
"pixelStrideInBytes" field is not zero). As a result the intensity calculation would take into account
data from other passes in the image, some of which was scaled by the number of samples still and
therefore produce widely incorrect results that then caused artifacts in the denoised image.
Maniphest Tasks: T93029
The root of the problem lies in bug in OIIO which we can work around
from our side (which does not affect pack memory usage).
Thanks Brecht for finding the root cause!
Differential Revision: https://developer.blender.org/D13186
Adds a method to profiler that can be used to check if it is active.
This is used to determine if stop_profiling and start_profiling
should be called.
| patch | Juans Scene UI 256 samples | Juans Scene bg 256 samples | junkshop UI | junkshop bg |
| No patch | 6:16.59 | 4:05.37 | 2:08.48 | 1:59.7 |
| D13187 | 4:12.15 | 3:57.36 | 2:07.25 | 1:58.16 |
| D13185 | 4.11.18 |3:54.74 | 2:07.44 | 1:58.03 |
| D13190 | 4:12.39 | 3:55.42 | 2:07.62 | 1:58.68 |
UI - means rendered from within Blender
bg - means rendered from the command line using ##blender -b scene.blend -f 1##
Reviewed By: sergey, brecht
Maniphest Tasks: T92601
Differential Revision: https://developer.blender.org/D13190
The issue was caused by splitting happening twice.
Fixed by checking for split flag which is assigned to the both states
during split.
The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.
Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.
Differential Revision: https://developer.blender.org/D13177
This is due to a driver bug, so disable it for now until it gets resolved
in a future driver release.
Ref T92972
Differential Revision: https://developer.blender.org/D13167
It's unclear why this fails. Maybe the size of half4 is not the expected
8 bytes and adjacent pixels are overwritten. Or there is some bug in the
HIP compiler writing a struct into global memory, which we probably don't
do elsewhere in the kernel.
Thanks to Thomas, William and Jeroen for helping investigate this.
The issue was that the `object_is_geometry` method was used in two different
contexts that expected the function to behave differently. So a recent change
that fixed `object_is_geometry` for one context, broke it for the other context.
The two contexts are:
* Check if a "real" object can contain a geometry to check if it has to be tagged
for sync after an update.
* Check if an object/instance actually is a geometry that cycles can work with.
I created a new `object_can_have_geometry` method for the first use case, instead
of trying to adapt the existing object_is_geometry method to serve both uses.
Additionally, I changed it so that a BObjectInfo is passed into `object_is_geometry`
to make it more explicit when this method is supposed to be used.
Differential Revision: https://developer.blender.org/D13135
Adds a pass before denoising that calculates the intensity of the image, which can be
passed into the OptiX denoiser for more optimal results for very dark or very bright images.
In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object
has to be destroyed before the OptiX device context object (since it references that). But in
C++ the destructor function of a class is called before its fields are destructed, so
"~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore
"optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash.
Differential Revision: https://developer.blender.org/D13160
Adds a workaround for a driver bug in r495 that causes artifacts with OptiX denoising.
`optixDenoiserSetup` is not working properly there when called with a stream other than the
default stream, so use the default stream for now and force synchronization across the entire
context afterwards to ensure the other stream Cycles uses to enqueue the actual denoising
command cannot execute before the denoising setup has finished.
Maniphest Tasks: T92472
Differential Revision: https://developer.blender.org/D13158
Changes:
* After hitting a shadow catcher, re-initialize the volume stack taking
into account shadow catcher ray visibility. This ensures that volume objects
are included in the stack only if they are shadow catchers.
* If there is a volume to be shaded in front of the shadow catcher, the split
is now performed in the shade_volume kernel after volume shading is done.
* Previously the background pass behind a shadow catcher was done as part of
the regular path, now it is done as part of the shadow catcher path.
For a shadow catcher path with volumes and visible background, operations are
done in this order now:
* intersect_closest
* shade_volume
* shadow catcher split
* intersect_volume_stack
* shade_background
* shade_surface
The world volume is currently assumed to be CG, that is it does not exist in
the footage. We may consider adding an option to control this, or change the
default. With a volume object this control is already possible.
This includes refactoring to centralize the logic for next kernel scheduling
in intersect_closest.h.
Differential Revision: https://developer.blender.org/D13093