In detail:
- Direct accesses of state attributes are replaced with the INTEGRATOR_STATE and INTEGRATOR_STATE_WRITE macros.
- Unified the checks for the __PATH_GUIDING define to use # if defined (__PATH_GUIDING__).
- Even if __PATH_GUIDING__ is defined, we now check if the feature is enabled using if ((kernel_data.kernel_features & KERNEL_FEATURE_PATH_GUIDING)) {. This is important for later GPU ports.
- The kernel usage of the guiding field, surface, and volume sampling distributions is wrapped behind macros for each specific device (atm only CPU). This will make it easier for a GPU port later.
Embree 4.4 introduces an improvement in the Embree GPU
implementation by dropping shared memory usage in favor
of direct controllable memory transfers. This should allow
addressing several problems spotted in Blender regarding
multithreading and memory corruption when BVH and rendering
happen at the same time. However, to implement such
improvements, the API has changed for several functions, and
this commit adopts Blender code to these changes, making Blender
buildable and functional with all existing Embree 4.X
versions, before and after 4.4.
No functional changes in Blender behavior are expected if
using Embree versions below 4.4.
Pull Request: https://projects.blender.org/blender/blender/pulls/139061
In Cycles, the convention is that reflection vs. refraction are classified
based on the hemisphere defined by the *shading* normal (N).
In general, most closure code uses the shading normal for most operations,
as is expected since using the geometric normal (Ng) would break normal maps
and smooth shading.
However, there are two places that use Ng: On the one hand, BSDF sampling
functions generally reject reflections that fall below the Ng hemisphere, since
they'd intersect the geometry when tracing the bounce. This is required, and
we can't do much about it.
On the other hand, the Microfacet evaluation code also checked that the ray
is in the same hemisphere w.r.t. both shading and geometric normal.
Theoretically, this is the right thing to do, since sampling and evaluation code
are supposed to be consistent. However, doing so breaks smooth shading, since
now direct light evaluation near the terminator will sometimes be rejected.
This didn't cause problems in practice because of another inconsistency: While
the parameter of the eval functions was named Ng, the caller actually provided
N (unclear whether by mistake or as a hacky workaround to the terminator).
When this was fixed in 063a9e89, users quickly reported issues with the shadow
terminator, so it was reverted to the hacky inconsistency in 1c50dd8b.
So, let's clean this mess up properly. If we don't want to do the Ng hemisphere
check in _eval, then instead of passing in a misleading value that ends up
making it a no-op, just remove the check. After all, the other closures don't
perform it either.
This way, we avoid the mislabeled Ng, we get rid of the special case for
microfacets, and the shadow terminator continues to be fine.
Technically, we still have the _sample vs. _eval mismatch. However, this is just
unavoidable, and is irrelevant in practice: For a strongly directional light
that makes the shadow terminator noticeable, the MIS weights will be massively
in favor of eval, to the point that it doesn't really matter what sample does.
To support this argument: You can actually reproduce a broken shadow terminator
in pretty much every Cycles version going back to 2011 by just setting up a
small intense mesh emitter, turning off MIS on it to disable _eval, and then
rendering a diffuse smooth-shaded sphere with >100000 samples so that the
fireflies resolve into somewhat consistent lighting.
If nobody has complained about this affecting all closures for 11 years,
I guess it's fine.
Pull Request: https://projects.blender.org/blender/blender/pulls/138632
This commit makes it so CUDA 11 is used to compile the compute_75
PTX CUDA kernels.
This is being done because PTX kernels have much stricter minimum
driver requirements than standard kernels, so using the latest CUDA
toolkit to compile PTX kernels can result in the PTX kernels being
inaccessible to users with drivers that are only a few months old.
This is important because in some situations, it's either impossible
(E.g. Renting certain cloud services), or difficult to update the GPU
drivers on some machines. And we want to make sure the PTX kernels
are usable by as many people as possible
Original Author: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/138879
Add a new shader node to control volume coefficients (scattering,
absorption and emission) directly, making it easier to model existing
volumes with measured data.
Pull Request: https://projects.blender.org/blender/blender/pulls/136287
Note: this is a partial fix, that makes NEE and forward path consistent
only when `max_transparent_bounce > 0`. It is much more involved to make
forward path tracing support a max transparent bounce of 0, but since we
don't expect people to set up a very low number of transparent bounces,
it is less important to support that specific case.
Pull Request: https://projects.blender.org/blender/blender/pulls/138098
The issue here is that the automatic bump shader in OSL adjusts globals.N,
which is used to define each closure's shading normal, but sd->N remains
as it was (unlike SVM, which sets it from svm_node_set_normal).
This is a problem since some code like the shadow terminator stuff uses sd->N,
so to make behavior consistent the fix is to set it from OSL as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/138092
On the one hand, this improves initialization time since we don't need to
load/compile the full OSL module with all the shading logic if we're only
using a custom camera with SVM shading.
On the other hand, it also fixes a bug I noticed while preparing test scenes:
The AO and Bevel nodes don't work when using custom cameras with SVM on OptiX.
The issue there is that those two are handled by the SHADE_SURFACE_RAYTRACE
kernel, but since that one has intersection logic, we use the OptiX-specific
kernel even if OSL shading is disabled.
However, with the previous unified OSL module, this would mean loading
SHADE_SURFACE_RAYTRACE from kernel_osl.cu, which has `#undef __SVM__` and
therefore doesn't handle them correctly.
With this change, we'll use the kernels from kernel_shader_raytrace.cu in that
case, which do support SVM nodes just fine.
Disk usage of the new kernel_optix_osl_camera.ptx.zst file is 30KB, so this
also doesn't blow up the kernel disk size (and kernel_optix_osl.ptx.zst is
probably smaller by that amount now).
Since it seems that we can mix modules just fine, I'm suspecting that we could
split the modules properly (intersection, SVM shading with raytracing,
OSL shading, OSL camera), instead of the current approach where modules
essentially correspond to feature set tiers and each includes the previous
one's kernels as well - but that's a separate refactor.
Pull Request: https://projects.blender.org/blender/blender/pulls/138021
osl_transform_triple(), osl_transform_dvmdv() and so on are supposed to apply
the given transform in the context of OSL's auto-differentiation system.
Therefore, the given input is a dual vector, containing both the value as v[0]
and its derivatives w.r.t. X and Y in v[1] and v[2].
However, the existing code treats these as a simple list of vectors, applying
the same operation to all three instead of propagating the derivatives.
On top of that, it also treated the given matrix input as if there were three
of them, which isn't the case.
Therefore, this commit replaces the implementation to do the right thing.
The Vector and Normal case are straightforward since the operation is linear,
so applying the same operation to all three vectors works.
The Point case is a bit more complicated, but not too bad when written out.
This bug mostly became apparent when using Object or Camera texture coordinates
with a Bump node, since that node uses OSL differentials and Object/Camera
coordinates are implemented using transform().
I'm pretty sure that all the other builtin functions (e.g. sin) at the bottom
of services_gpu.h have the same problem, but one thing at a time...
Pull Request: https://projects.blender.org/blender/blender/pulls/138045
Previously spell checker ignored text in single quotes however this
meant incorrect spelling was ignored in text where it shouldn't have
been.
In cases single quotes were used for literal strings
(such as variables, code & compiler flags),
replace these with back-ticks.
In cases they were used for UI labels,
replace these with double quotes.
In cases they were used to reference symbols,
replace them with doxygens symbol link syntax (leading hash).
Apply some spelling corrections & tweaks (for check_spelling_* targets).
This allows users to implement arbitrary camera models using OSL by writing
shaders that take an image position as input and compute ray origin and
direction.
The obvious applications for this are e.g. panorama modes, lens distortion
models and realistic lens simulation, but the possibilities are endless.
Currently, this is only supported on devices with OSL support, so CPU and
OptiX. However, it is independent from the shading model used, so custom
cameras can be used without getting the performance hit of OSL shading.
A few samples are provided as Text Editor templates.
One notable current limitation (in addition to the limited device support)
is that inverse mapping is not supported, so Window texture coordinates and
the Vector pass will not work with custom cameras.
Pull Request: https://projects.blender.org/blender/blender/pulls/129495
In forward path tracing, when we pass volume bounding meshes, we
accumulate `volume_bounds_bounce`. We should match this behaviour in NEE
instead of accumulating `transparent_bounce`.
Pull Request: https://projects.blender.org/blender/blender/pulls/137556
This commit fixes a issue where Cycles adaptive kernel compilation
would always undefine adaptive kernel features, resulting in various
issues like incorrect renders.
Pull Request: https://projects.blender.org/blender/blender/pulls/137804
The crash was caused by an overflow in the opgl_path_segment_storage
array. It happened because shadow catcher paths would write to the
segments but not clear them. This made it so the next render loop
iteration for the main path starts with non-empty segments in the
guiding data.
Disable training when megakernel is called for the shadow catcher
state.
To ensure this issue is not forgotten when the guiding is ported to
GPU add asserts in the `guiding.h`. While it is a no-op for default
GPU kernels sometimes we do compile debug kernels. But also it acts
as a plain-text reminder to the future-us in working on the code.
There is now also an assert before the main path megakernel to help
catching such cases in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/137291
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109
Commit be63ebd961
This is causing issues with CUDA kernel compilation in some setups, even
though the builbot is ok. Since this isn't yet working for oneAPI anyway,
revert all the changes to Cycles kernel compilation for now.
Use VERBATIM to ensure spaces inside command line arguments don't get
escaped automatically.
On Linux and Windows the oneAPI kernel compilation still has problems.
There is an apparent bug with single quote escaping in add_custom_command
which means it's not easy to use VERBATIM.
The current logic disables WITH_CYCLES_ONEAPI_BINARIES when ocloc is not
found, which is fine, but prevented building for other non-Intel SYCL
targets without (unnecessary) ocloc.
The fix here is to remove spir64_gen target when
WITH_CYCLES_ONEAPI_BINARIES is disabled, instead of forcing only spir64.
Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.
This ensures intersection functions are not inlined multiple times,
bringing performance back.
Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136823
The initial issues that led to the choice of forcing the use of
linker.exe seem gone and there is currently no strong reason to use
linker.exe explicitly, so let's simplify and use the default setting.
The initial limitation preventing from using -ffast-math, worked around
in 09df1f4caf, got fixed upstream in LLVM
and the fix is part of current DPC++ compiler:
63ecd2a725
We're now able to go back to using -ffast-math, which helps simplifying
the set of compiler flags.
No performance nor conformance change is expected from this change (most
of the gain is achieved already with the use of -cl-fast-relaxed-math
since 284b89a0a3) and this has been
verified on Arc B580 under Windows.
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.
This change makes it consistent and passes kg everywhere.
Pull Request: https://projects.blender.org/blender/blender/pulls/136503
The code before this change was relying on the ShadowPayload have
the same "header" as RayPayload for some of the primitive types
(curve, motion triangle, point): intersection functions were shared
between "regular" and shadow rays (shadow in this case is shadow_all),
but extra filter function was used for shadow rays.
This is fragile if someone changes one of these structures. What is
worse is that compiler might actually decide to shuffle things in
some structs, or remove unused fields.
This change also solves confusion about ShadowPayload::prim_type
seemingly only being assigned to PRIMITIVE_NONE. With time it is
not impossible that compiler will also see this, and constant-fold
some checks, or even remove the field. If that happens then the
render result will be wrong. Maybe it is already happening as there
are some GPU and driver and optimization flag specific bugs in the
area.
It is unclear whether it was causing any actual problem: W7800
seems to render all hair correctly on Linux.
Also make some style decisions more consistent: for example,
the way how stop/continue search return value is commented.
Prefer lower vertical space for those.
Mainly readability purposes:
- Having variables called local_payload is ambiguous: does it refer to
LocalPayload type or to a variable be local in a function?
- Some of the functions are used for different ray types, so having the
type case in intersectFunc and filterFunc makes it easier to scan.
For the latter: now it is more obvious that Curve_Intersect_Shadow
expects RayPayload, but Curve_Filter_Shadow expects ShadowPayload.
It might not be a problem currently as ShadowPayload has the same
"header" RayPayload, but it might change in the future. Also, compiler
might optimize fields out from one but not from the other.
There is a known precision bug in the current HIP compiler version (RDNA2 family/Windows) that has already been fixed and will be available in
a future HIP SDK release. Enabling more precise math prevents the artifacts.
This may cause a 5-10% performance drop in some scenes.
Fix#136138: Microfacet BSDF
Fix#136449: Hair BSDF
Pull Request: https://projects.blender.org/blender/blender/pulls/136341
The new correction avoids washed out areas near the shadow terminator,
preserving more detail from normal and bump maps.
It implements the method from the paper "A Microfacet-Based Shadowing
Function to Solve the Bump Terminator Problem" by Alejandro Conty Estevez,
Pascal Lecocq, and Clifford Stein.
Pull Request: https://projects.blender.org/blender/blender/pulls/135380