In the current OpenCL implementation we have a work-around for platforms
that didn't support NULL pointers. We used to replace all NULLs and
empty arrays with a pointer to a single byte on the OpenCL Device.
During investigation of {T65924} it was asked to remove this work-around
for testing. This change improves the render times.
SCENE | BEFORE | AFTER
--------------------+--------+-------
bmw27 | 108 | 89
barbershop_interior | 867 | 673
classroom | 270 | 173
fishy_cat | 244 | 196
koro | 249 | 207
pavillon_barcelona | 582 | 414
Note that this change does not fix T65924 it just improves the
rendering performance for OpenCL. We haven't tested this patch on all
platforms so we should keep an eye out on the tracker.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D6391
This adds compaction support for OptiX acceleration structures, which reduces the device memory footprint in a post step after building. Depending on the scene this can reduce the amount of used device memory quite a bit and even improve performance (smaller acceleration structure improves cache usage). It's only enabled for background renders to make acceleration structure builds fast in viewport.
Also fixes a bug in the memory management for OptiX acceleration structures: These were held in a dynamic vector of 'device_memory' instances and used the mem_alloc/mem_free functions. However, those keep track of memory instances in the 'cuda_mem_map' via pointers to 'device_memory' (which works fine everywhere else since those are never copied/moved). But in the case of the vector, it may decide to reallocate at some point, which invalidates those pointers and would result in some nasty accesses to invalid memory. So it is not actually safe to move a 'device_memory' object and therefore this removes the move operator overloads again.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D6369
When encountering an error during context creation, the "OptiXDevice" constructor aborts early.
This means the "cuda_stream" vector is never resized and the destructor iterated over non-existent data.
The acceleration structure built by OptiX may be different between GPUs, so cannot assume the memory size is the same for all.
This fixes that by moving the memory management for all OptiX acceleration structures into the responsibility of each device (was already the case for BLAS previously, now for TLAS too).
Calling "OptiXDevice::load_kernels" multiple times would call "optixPipelineDestroy" on a pipeline
pointer that may have already been deleted previously (since the PIP_SHADER_EVAL pipeline is only
created conditionally).
This change also avoids a CUDA kernel reload every time this is called. The CUDA kernels are
precompiled and don't change, so there is no need to reload them every time.
The multi device code did not correctly handle cases where some GPUs store a
resource in device memory and others store it in host mapped memory.
Differential Revision: https://developer.blender.org/D6126
Was caused by D6068, which did not handle "MEM_PIXELS" memory
when not in background mode. Before that it always fell back to using
generic device memory, so restoring that behavior. In future this
should be changes to use OpenGL interop for optimal performance.
The OptiX implementation wasn't trying to allocate memory on the host if device allocation failed, while the CUDA implementation did. This copies the implementation over to OptiX to remedy that.
Differential Revision: https://developer.blender.org/D6068
This is intended for developers on Windows primarily:
Now, CUDA architectures of type compute_xx are supported. This allows for quicker builds,
at the expense of the CUDA driver running ptxas the first time a kernel is loaded.
Differential Revision: https://developer.blender.org/D5953
Rendering would produce invalid results or crash if the Vector pass was active but motion blur was inactive. This caused the OptiX BVH to be built with motion (because objects reported motion available), but the pipeline to be built without motion support (since with disabled motion blur this is not in the list of requested features). The two are not compatible and therefore caused issues. This patch fixes that by not building the BVH with motion if motion blur is not active (which makes sense).
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5968
Curves with motion blur produced wrong results with OptiX (T69801). This is because the AABBs for the motion steps were calculated from incorrect attribute data because the offset into the attribute data array was incorrect.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5961
The "optix_devices" array was not freed on exit, which caused a memory leak (see T69801).
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5944
OpenCL Parallel compilation only works inside Blender. When using cycles in a different setup (standaline or other software) it failed compiling kernels as they don't have the appropriate Python API and command line arguments.
This change introduces a `running_inside_blender` debug flag, that triggers out of process compilation of the kernels. Compilation still happens in subthread that enabled the preview kernels and compilation of the kernels during BVH building
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5439
Before there were two options: Paste to original layer called "Paste" and Paste to active layer called "Paste & Merge"
Now, by default the paste is in active layer and the "Paste & Merge" has been renamed "Paste".
For old "Paste", now is called "Paste by Layer" and it's not the default value anymore.
Note: Minor edits to add icons not present in Differential revision.
Differential Revision: https://developer.blender.org/D5591
x64 builds with WITH_CYCLES_OPTIMIZED_KERNEL_SSE2 not defined
since SSE2 is the lower bar for x64 cpus. Turning the architecture
logging related if into the last if in the architecture detection
chain, which will never execute unless you turn off all kernels
in de debug flags.
Reviewers: brecht
Differential Revision: https://developer.blender.org/D5579
The issue was caused by un-initialized local storage for volume
intersection hits which are supposed to be stored in per-thread
KernelGlobals.
Fix is to make thread_shader() be the same as thread_render() in
respect of KernelGlobals.
Reviewers: brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5230
We want users to go to the current version for their current version
when possible if not point to latest.
/dev should really only be for development related work. End users
should not be browsing /dev unless they are reading about upcoming
features ahead of time.
When compute preemption is available we schedule more work which is more
efficient. However the CUDA driver appears to be incorrectly reporting this as
unavailable, even though it should be supported starting with Windows 10 1803
and Pascal and Turing (10x0 and 20x0) graphics cards.
This reduces render time by about a 25% difference on our benchmark scenes. On
Linux compute preemption appears to be reported correctly.
Previously, bright edges (e.g. caused by rim lighting) would sometimes get
halos around them after denoising.
This change introduces a log(1+x) highlight compression step that is performed
before denoising and reversed afterwards. That way, the denoising algorithm
itself operates in the compressed space and therefore bright edges cause less
numerical issues.
The kernel does not use AVX2 vectorization, and trying to use BVH8 was
leading to an empty scenes.
Fixes T64624: Ctest : Win32 + AVX2 fails virtually all cycles tests
This adds our own OSL texture handle, that has info for OIIO textures or our
own custom texture types. A filename to handle hash map is used for lookups.
This is efficient because it happens at OSL compile time, because the optimizer
can figure out constant strings and replace them with texture handles.
It's effectively always enabled, only not on some unsupported OpenCL devices.
For testing those it's not useful to disable these features. This is replaced
by the more fine grained feature toggles that we have now.
This version fixes various bugs, and there is no need anymore to use both
9.1 and 10.0 for different cards.
There is a bug related to WITH_CYCLES_CUBIN_COMPILER and bump mapping in the
regression tests, so that remains disabled same as it was for CUDA 10.0.
Fix T59286: CUDA bake failing on some cards.
Fix T56858: CUDA 9.2 and 10 issues.
The main goals of this change is faster starting when using foreground
rendering.
This patch will build kernels in parallel to the update process of
the scene. When these optimized kernels are not available (yet) an AO
kernel will be used.
These AO kernels are fast to compile (3-7 seconds) and can be
reused by all scenes. When the final kernels become available we
will switch to these kernels.
In background mode the AO kernels will not be used.
Some kernels are being used during Scene update (displace, background
light). When these kernels are being used the process can halt until
these become available.
Reviewed By: brecht, #cycles
Maniphest Tasks: T61752
Differential Revision: https://developer.blender.org/D4428
The functions that determine the program name + filename of kernels
were missing some base kernels like denoising and base. For completeness
I added those kernels so the function returns the correct results.