Running Xcode memory graphs and the Instruments tools revealed
memory leaks caused, in the main, by over-retained objects.
This removes the unnecessary 'retains' and adds some asserts
to guard against over-retaining in the future.
There are a few memory leaks remaining involving PyUnicode_DecodeUTF8
but I am unable to identify the cause of these at this time.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/129117
Changing 3D curve properties while viewport rendering was active
resulted in an error, because Cycles would attempt to update the
acceleration structure containing the curves, but that acceleration
structure was built without the
`OPTIX_BUILD_FLAG_ALLOW_UPDATE` flag allowing updates. This
fixes that by adding the flag to all curve build inputs.
Ideally could just use the same flags as for other build inputs and
differentiate between viewport and final rendering (based on
`bvh_type`), but that's not currently an option since the same flags
have to be specified to query the curve intersection module in
`load_kernels()`, where that differentiation is not known. See also
commit 5c6053ccb1.
Pull Request: https://projects.blender.org/blender/blender/pulls/129634
This commit removes support for Vega GPUs from the AMD HIP backend of
Cycles. This is being done as:
- AMD no longer provides official support for Vega GPUs in their
ROCm software.
- Vega GPUs have rendering artifacts on all supported platforms,
and as a result of the reduction of support from AMD, are unlikely
to be fixed. Rendering artifacts include.
- The incorrect shading of volumes (Windows and Linux)
- Missing intersections on many meshes with HIPRT
- Crashing rendering subsurface scattering materials (Linux)
- And more.
Pull Request: https://projects.blender.org/blender/blender/pulls/129523
The `osl_wavelength_color_vf` intrinsic was missing an implementation for
OptiX, causing a link error when attempting to load OSL shaders using the
wavelength node.
Pull Request: https://projects.blender.org/blender/blender/pulls/129372
For now, PointerRNA is made non-trivial by giving explicit default
values to its members.
Besides of BPY python binding code, the change is relatively trivial.
The main change (besides the creation/deletion part) is the replacement
of `memset` by zero-initialized assignment (using `{}`).
makesrna required changes are quite small too.
The big piece of this PR is the refactor of the BPY RNA code.
It essentially brings back allocation and deletion of the BPy_StructRNA,
BPy_Pointer etc. python objects into 'cannonical process', using `__new__`,
and `__init__` callbacks (and there matching CAPI functions).
Existing code was doing very low-level manipulations to create these
data, which is not really easy to understand, and AFAICT incompatible
with handling C++ data that needs to be constructed and destructed.
Unfortunately, similar change in destruction code (using `__del__` and
matching `tp_finalize` CAPI callback) is not possible, because of technical
low-level implementation details in CPython (see [1] for details).
`std::optional` pointer management is used to encapsulate PointerRNA
data. This allows to keep control on _when_ actual RNA creation is done,
and to have a safe destruction in `tp_dealloc` callbacks.
Note that a critical change in Blender's Python API will be that classes
inherinting from `bpy_struct` etc. will now have to properly call the
base class `__new__` and/or `__init__`if they define them.
Implements #122431.
[1] https://discuss.python.org/t/cpython-usage-of-tp-finalize-in-c-defined-static-types-with-no-custom-tp-dealloc/64100
In volume segment, the minimal angle formed by the emitter bounding cone
axis and the vector pointing from the cluster centroid to any point on
the ray is computed via `dot(bcone.axis, point_to_centroid)`, see Fig.8.
in paper.
For distant light this angle is 0, but due to numerical issues this is
not always true. Therefore explicitly assign `-bcone.axis` to
`point_to_centroid` in this case.
Pull Request: https://projects.blender.org/blender/blender/pulls/129489
Currently the number of shadow paths is multiplied by the ratio of
0.5f which would half the number of paths. However, the index can
never be smaller than the number of paths so the shadow paths will
always be compacted.
Pull Request: https://projects.blender.org/blender/blender/pulls/125048
Looks like some recent changes in the driver broke an assumption
the OptiX denoiser code in Cycles made about being able to set it up
with a different input size than later used to invoke it, which caused
broken output on older GPU architectures. This commit fixes that by
ensuring the input image size passed to `optixDenoiserSetup` matches
that passed to `optixDenoiserInvoke`, even when no tiling is used
(which is the common case).
Pull Request: https://projects.blender.org/blender/blender/pulls/129398
This PR fixes a latent issue arising from invalid use of `accept_any_intersection(true)` when performing SSS ray-stepping with MetalRT. The comment incorrectly states that "we can optimize and accept the first hit", but to guarantee correct behaviour in future we need to request the closest hit.
Draine phase function sampling internally use Henyey-Greenstein and
Rayleigh sampling for degenerated cases, but the sampling pattern was
different between Draine and Rayleigh. The commit effectively replace
`rand` with `1 - rand` in Rayleigh sampling.
Pull Request: https://projects.blender.org/blender/blender/pulls/129261
When `g == 0`, the Draine phase function from
https://doi.org/10.1051/0004-6361/202142437 simplifies to
\[\Phi(\theta)=\frac{3}{4\pi(3+\alpha)}(1+\alpha\cos^2\theta).\]
Similar as Rayleigh sampling in https://doi.org/10.1364/JOSAA.28.002436,
The solution to the CDF of the marginal density function is
\[\cos^3\theta+a\cos\theta+b=0,\]
with
\[a=\frac{3}{\alpha},\quad b=\frac{3+\alpha}{\alpha}(2\xi_1-1),\]
which has only one real root since \(\alpha > 0\),
resulting in the sample technique
\[\cos\theta=u-\frac{1}{\alpha u}.\]
Pull Request: https://projects.blender.org/blender/blender/pulls/129259
Fix the failing rendering of volumes on Windows with HIP SDK 6.1
by reducing the optimization level.
There should be no functional or performance difference for the average
user as the Blender foundation currently does not use HIP SDK 6.1
on Windows. This change is primarily to fix issues for community members
building Blender locally.
Pull Request: https://projects.blender.org/blender/blender/pulls/128836
Gitea would complain the apostrophe in one of the code comments in
tree.h was an ambiguous Unicode character. So fix it by swapping it
for a more common apostrophe type.
Previously, in case of a failure during BVH transfer, when running out
of memory for example, we could get an error such as "BVH failed to
migrate to the GPU due to Embree library error (no error)", because
embree error status was actually reset before being queried.
This commit fixes its propagation.
Pull Request: https://projects.blender.org/blender/blender/pulls/129022
The ray offsetting triangle tests are not numerically identical to
those found in custom BVH implementations.
There was a TODO to fix this, but there was no explaination for why
it should be done. This fixes that.
This probably should always have been the value used, really.
Now, instead of reporting `Qualcomm Technologies Inc`, it reports the more informative `Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU` on a Thinkpad T14s Gen6 device.
Pull Request: https://projects.blender.org/blender/blender/pulls/128808
This packs the SVM stack, current node offset and closure weight into one struct, and just passes that to each SVM node implementation.
This way we don't have to pass the offset back and forth all over the place, and adding additional state (e.g. for layering in the future) becomes easier.
Pull Request: https://projects.blender.org/blender/blender/pulls/110443
- Deduplicate Fisheye projection code
- Replace spherical/cartesian conversions with shared helpers
- Replace transforms from/to local coordinate systems with shared helpers
The main type of repeated transform that's not covered here is `to/from_coords`, but with separate values for xy and z (e.g. BSDFs that already computed `dot(wi, N)` earlier, so they only need `dot(wi, X)` and `dot(wi, Y)` later). Could also be replaced, but it would feel weirdly specific for a helper function.
Pull Request: https://projects.blender.org/blender/blender/pulls/125999
Previously, when compiling on Rocky Linux 8 with fno-honor-nans, compile
time was more than 5x longer than expected, and there was an unresolved
symbol to __sqrtf_finite in GPU binaries.
Once defining sqrtf in compat.h, both issues are effectively gone, this
was certainly due to problematic interactions with build system's math
library headers.
So we can remove current workaround of defining fhonor-nans, and now
have the same set of flags on both Windows and Linux.
Fixes a issue where the Principled BSDF would render incorrectly if
`__SUBSURFACE__` is off. Which is common when using adaptive kernel
compilation (a unsupported Cycles feature).
Pull Request: https://projects.blender.org/blender/blender/pulls/128003
Turns out it is possible to have code to pick up wrong class
when defining a friend:
```
intern\cycles\device/memory.h(255): warning C4099: 'GPUDevice': type name first seen using 'struct' now seen using 'class'
source\blender\gpu\GPU_platform.hh(69): note: see declaration of 'GPUDevice'
```
Now made it so the classes have forward declaration in the CCL
namespace, avoiding possible conflict with the classes with the
same name in the global namespace.
Pull Request: https://projects.blender.org/blender/blender/pulls/128485
This adds feature parity with Cycles regarding light and shadow liking.
Technically, this extends the GBuffer header to 32 bits, and uses
the top bits to store the object's light set membership index.
The same index is also added to `ObjectInfo` in place of padding bytes.
For shadow linking, the shadow blocker sets bitmask is stored per
tilemap. It is then used during the GPU culling phase to cull objects
that do not belong to the shadow's sets.
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/127514
Previously, Cycles only supported the Henyey-Greenstein phase function for volume scattering.
While HG is flexible and works for a wide range of effects, sometimes a more physically accurate
phase function may be needed for realism.
Therefore, this adds three new phase functions to the code:
Rayleigh: For particles with a size below the wavelength of light, mostly athmospheric scattering.
Fournier-Forand: For realistic underwater scattering.
Draine: Fairly specific on its own (mostly for interstellar dust), but useful for the next entry.
Mie: Approximates Mie scattering in water droplets using a mix of Draine and HG phase functions.
These phase functions can be combined using Mix nodes as usual.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/123532
This enables most of the GPU compiler's optimizations while -ffast-math
isn't set at DPC++ level.
It brings an overall 1% speedup and currently doesn't change the unit
tests pass rate.
This enables three additional math optimizations:
-ffp-contract=fast (enables FMA generation)
-freciprocal-math (enables x/y -> x*(1/y))
-fassociative-math (enables e.g. a*b + c*b -> (a+c)*b)
These are used on Windows and HIP anyways, so our code can't expect exact IEEE
semantics in any case.
The only difference between the new set and -ffast-math is that we don't use
-ffinite-math-only since this causes issues with the BVH (see ce1f2e271d) and
breaks e.g. isnan.
This causes a ~1.5% speedup in my very quick test, but might be higher for some
more math-intensive cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/128342