The assembler template was backing up and restoring ebx, which is
fair enough. However, this did not prevent compiler for putting
result variables to ebx. This was causing data corruption.
In order to prevent this easiest solution is to list ebx in clobbers
for the assembly.
The script clearly states:
This makes the presumption that you are include al.h like
#include "al.h"
and not
#include <AL/al.h>
The reason for this is that the latter is not entirely portable.
Windows/Creative Labs does not by default put their headers in AL/ and
OS X uses the convention <OpenAL/al.h>.
This commit makes default precompiled OpenAL to be properly detected
and also removes hack on MacOS which was finding the OpenAL package but
then was overwriting include directory.
Note, that new audaspace in 2.8 is using expected #include <al.h>.
This is an initial implementation of BVH8 optimization structure
and packated triangle intersection. The aim is to get faster ray
to scene intersection checks.
Scene BVH4 BVH8
barbershop_interior 10:24.94 10:10.74
bmw27 02:41.25 02:38.83
classroom 08:16.49 07:56.15
fishy_cat 04:24.56 04:17.29
koro 06:03.06 06:01.45
pavillon_barcelona 09:21.26 09:02.98
victor 23:39.65 22:53.71
As memory goes, peak usage raises by about 4.7% in a complex
scenes.
Note that BVH8 is disabled when using OSL, this is because OSL
kernel does not get per-microarchitecture optimizations and
hence always considers BVH3 is used.
Original BVH8 patch from Anton Gavrikov.
Batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
Misleading name since it's between 0..1.
Use as a keyword argument to prepare for keyword only args.
Also document that leaving unset has special behavior.
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
Building the CUDA kernels takes quite a bit of memory, and when building all of
them the combined usage can be too much on some systems (especially VMs).
Therefore, this patch adds an option to force the build system to build them
sequentially by making each build step depend on the previous kernel.
Reviewers: brecht, sergey
Differential Revision: https://developer.blender.org/D3623