The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109
Currently MetalRT interpolates transformation matrix on per-element basis
which leads to issues like #135659.
This change adds implementation of for decomposed (Scale/Rotate/Translate)
motion interpolation, matching behavior of BVH2 and other HW-RT.
This requires macOS 15 and Xcode 16 in order to use this interpolation.
On older platforms and compilers old interpolation is used.
Currently there is no changes on the user (by default) and it is only
available via CYCLES_METALRT_PCMI environment variable. This is because
there are some issues with complex motion paths that need to be looked
into. Having code available makes it easier to do further debugging.
Ref #135659
Authored by Emma Liu
Pull Request: https://projects.blender.org/blender/blender/pulls/136253
* Add SubdAttributeInterpolation class for linear attribute interpolation.
* Dicing computes ptex UV and face ID for interpolation.
* Simplify mesh storage of subd primitive counts
* Remove kernel code for subd attribute interpolation
* Remove patch table packing and upload
The old optimization adds a fair amount of complexity to the kernel, affecting
performance even when not using the feature. It's also not that useful as it
does not work for UVs that needs special interpolation. With this simpler code
it should be easier to make it feature complete.
Pull Request: https://projects.blender.org/blender/blender/pulls/135681
The attribute handling code in the kernel is currently highly duplicated since
it needs to handle five different data types and we couldn't use templates
back then.
We can now, so might as well make use of it and get rid of ~1000 lines.
There are also some small fixes for the GPU OSL code:
- Wrong derivative for .w component when converting float2/float3->float4
- Different conversion for float2->float (CPU averages, GPU used to take .x)
- Removed useless code for converting to float2, not used by OSL
Pull Request: https://projects.blender.org/blender/blender/pulls/134694
The main goal of these changes are to improve static (i.e. build-time)
checks on whether a given data can be allocated and freed with `malloc`
and `free` (C-style), or requires proper C++-style construction and
destruction (`new` and `delete`).
* Add new `MEM_malloc_arrayN_aligned` API.
* Make `MEM_freeN` a template function in C++, which does static assert on
type triviality.
* Add `MEM_SAFE_DELETE`, similar to `MEM_SAFE_FREE` but calling
`MEM_delete`.
The changes to `MEM_freeN` was painful and useful, as it allowed to fix a bunch
of invalid calls in existing codebase already.
It also highlighted a fair amount of places where it is called to free incomplete
type pointers, which is likely a sign of badly designed code (there should
rather be an API to destroy and free these data then, if the data type is not fully
publicly exposed). For now, these are 'worked around' by explicitly casting the
freed pointers to `void *` in these cases - which also makes them easy to search for.
Some of these will be addressed separately (see blender/blender!134765).
Finally, MSVC seems to consider structs defining new/delete operators (e.g. by
using the `MEM_CXX_CLASS_ALLOC_FUNCS` macro) as non-trivial. This does not
seem to follow the definition of type triviality, so for now static type checking in
`MEM_freeN` has been disabled for Windows. We'll likely have to do the same
with type-safe `MEM_[cm]allocN` API being worked on in blender/blender!134771
Based on ideas from Brecht in blender/blender!134452
Pull Request: https://projects.blender.org/blender/blender/pulls/134463
The original report stumbled upon this issue with a more tricky
configuration when light linking is combined with light tress.
However, the actual contributing factor was a mesh with emission
shader which is not assigned to any triangles. This triggered a
bug in the BoundBox::transformed() which converted non-valid bounds
to bounds by performing per-corner growing.
Additionally fix incorrect handling of shared nodes which only
worked for leaf nodes. This was due to the fact how the measure
was accumulated: it is possible that add() is called with an empty
measure.
Pull Request: https://projects.blender.org/blender/blender/pulls/134699
To align better with the pixel and reduce the samples needed.
The paper was using gamma because the jacobian |d_gamma/d_h| approaches
infinity at the boundaries, but it seems that clamping at 0.999 is
enough for numerical stability.
In practice I did not notice a change in the noise level, but it
simplifies the range computation and renders faster due to reduced
sample amount.
Co-authored-by: Olivier Maury <omaury@meta.com>
Ref: !129616
Pull Request: https://projects.blender.org/blender/blender/pulls/134130
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.
Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
The original paper uses the single scattering albedo `sigma_s/sigma_t`
to pick a channel for sampling the scattering distance. However, this
only considers the situation where there is scattering inside the volume.
If some channel has an extinction coefficient of zero, the light passes
through without attenuation for that channel. We assign such channel
with a weight of 1 instead of 0 to make sure it can be sampled.
Pull Request: https://projects.blender.org/blender/blender/pulls/131741
The `ProjectionTransform` object has no trivial copy-assignment constructor.
This results in the following warning on `gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0`:
```
/.../blender-git/blender/intern/cycles/kernel/../util/projection.h: In function ‘ccl::ProjectionTransform ccl::projection_inverse(ProjectionTransform)’:
/.../blender-git/blender/intern/cycles/kernel/../util/projection.h:219:9: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘ccl::ProjectionTransform’ {aka ‘struct ccl::ProjectionTransform’} with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
219 | memcpy(&tfmR, R, sizeof(R));
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~
/.../blender-git/blender/intern/cycles/kernel/../util/projection.h:67:16: note: ‘ccl::ProjectionTransform’ {aka ‘struct ccl::ProjectionTransform’} declared here
67 | typedef struct ProjectionTransform {
| ^~~~~~~~~~~~~~~~~~~
```
To fix the warning, cast the pointer to `(void *)`.
Pull Request: https://projects.blender.org/blender/blender/pulls/130321
All the OSL matrix functions had been implemented using the
`Transform` utility of Cycles, but that's built around a 4x3 matrix,
when the OSL matrix functions are working with 4x4 matrices.
This resulted in them not producing results consistent with the
CPU implementation.
This fixes that by making use of the `ProjectionTransform` utility
of Cycles instead, because it's built around a 4x4 matrix. Since
matrix inversion is required, I had to make a few more utility
functions available on the GPU (except Metal, due to use of
references/pointers without specification) that were previously
CPU-only.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/110102
This probably should always have been the value used, really.
Now, instead of reporting `Qualcomm Technologies Inc`, it reports the more informative `Snapdragon(R) X Elite - X1E78100 - Qualcomm(R) Oryon(TM) CPU` on a Thinkpad T14s Gen6 device.
Pull Request: https://projects.blender.org/blender/blender/pulls/128808