Commit Graph

47 Commits

Author SHA1 Message Date
Sergey Sharybin
8e49bc4a05 Refactor: Make Cycles shadow linking primitives receive ray self primitives
No functional changes.

Makes it closer to other self-intersection checks, making it easier to
re-use functions from the HW RT kernels.

Pull Request: https://projects.blender.org/blender/blender/pulls/111971
2023-09-06 09:53:29 +02:00
Sergey Sharybin
71b4a97cbc Refactor: De-duplicate Metal RT self intersection checks
Use the common BVH utilities header for this.

Added a special type qualifier ccl_ray_data which is defined to ccl_private
for all platforms but Metal. On Metal it is defined to ray_data.

The tricky part is that the BVH utilities are wrapped into the Metal context
class. In some of the BVH functions the context has been already constructed,
but it wasn't done in all the callbacks.

From a quick render tests of the Junkshop benchmark scene there is no render
time difference,

No functional changes are expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/111967
2023-09-05 17:21:49 +02:00
Campbell Barton
0caf227530 License headers: use SPDX-FileCopyrightText for *.inl and *.osl files 2023-08-04 13:24:17 +10:00
Campbell Barton
c12994612b License headers: use SPDX-FileCopyrightText in intern/cycles 2023-06-14 16:53:23 +10:00
Sergey Sharybin
ba3f26fac5 Cycles: light and shadow linking
With light linking, lights can be set to affect only specific objects in the
scene. Shadow linking additionally gives control over which objects acts a
shadow blockers for a light.

Usage:
https://wiki.blender.org/wiki/Reference/Release_Notes/4.0/Cycles

Implementation:
https://wiki.blender.org/wiki/Source/Render/Cycles/LightLinking

Ref #104972
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2023-05-24 14:11:47 +02:00
Campbell Barton
6859bb6e67 Cleanup: format (with BraceWrapping::AfterControlStatement "MultiLine") 2023-05-02 09:37:49 +10:00
William Leeson
6c03339e48 Cycles: reduce mesh memory usage by unflattening
To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered.

The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine.

Pull Request #105173
2023-02-27 10:39:19 +01:00
Patrick Mours
e6b38deb9d Cycles: Add basic support for using OSL with OptiX
This patch  generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.

Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.

Maniphest Tasks: T101222

Differential Revision: https://developer.blender.org/D15902
2022-11-09 15:30:21 +01:00
Lukas Stockner
e2a93e9c7c Fix T94136: Cycles: No Hair Shadows with Transparent BSDF 2022-10-20 04:47:21 +02:00
Brecht Van Lommel
fa514564b0 Fix T99201: Cycles render difference with 3D hair curves between OptiX and Emrbee
It should consistently use the Cycles pirmitive ID for self intersection detection,
not the one from the OptiX or Embree acceleration structure.

Differential Revision: https://developer.blender.org/D15632
2022-08-05 15:03:47 +02:00
Brecht Van Lommel
38af5b0501 Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX
Simplifies intersection code a little and slightly improves precision regarding
self intersection.

The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
2022-07-27 21:03:33 +02:00
Campbell Barton
f1f89ca751 Cleanup: spelling in comments 2022-07-26 13:21:21 +10:00
Brecht Van Lommel
7a74d91e32 Cleanup: move device BVH code to kernel/device/*/bvh.h
Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with
many #ifdefs became too confusing. Instead split it up per device, and also
move it together with device specific hit/filter/intersect functions and
associated data types.
2022-07-25 16:34:22 +02:00
Brecht Van Lommel
484ad31653 Cycles: simplify handling of ray distance in GPU rendering
All our intersections functions now work with unnormalized ray direction,
which means we no longer need to transform ray distance between world and
object space, they can all remain in world space.

There doesn't seem to be any real performance difference one way or the
other, but it does simplify the code.
2022-07-25 13:27:40 +02:00
Brecht Van Lommel
5152c7c152 Cycles: refactor rays to have start and end distance, fix precision issues
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.

Fix T98764, T96537, hair ray tracing precision issues.

Differential Revision: https://developer.blender.org/D15455
2022-07-15 18:46:24 +02:00
Brecht Van Lommel
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Brecht Van Lommel
f2cd7e08fe Fix Cycles MNEE not working for Metal
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.

We can experiment with bigger kernel organization changes later.

Differential Revision: https://developer.blender.org/D15070
2022-05-31 17:24:43 +02:00
Stefan Werner
65dcb5ebd3 Cycles: Semantically separate 2D and 3D texture objects
Currently there are no functional changes.

Preparing for an upcoming oneAPI integration where such separation
in types is needed.
2022-04-01 19:44:31 +02:00
Brecht Van Lommel
076079454f Cleanup: remove some unused Cycles GPU code
To make porting to other architectures easier, clarifying that this does not
need to be supported. The unused parallel_reduce implementation assumed warp
size 32, but is easy to update if we ever need it in the future.
2022-03-16 18:05:08 +01:00
Brecht Van Lommel
9cfc7967dd Cycles: use SPDX license headers
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
  to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.

Ref D14069, T95597
2022-02-11 17:47:34 +01:00
William Leeson
ae44070341 Cycles: explicitly skip self-intersection
Remember the last intersected primitive and skip any intersections with the
same primitive.

Ref D12954
2022-01-26 17:51:05 +01:00
Brecht Van Lommel
e2e7f7ea52 Fix Cycles OptiX crash with 3D curves after point cloud changes
Includes refactoring to reduce the number of bits taken by primitive types,
so they more easily fit in the OptiX limit.
2021-12-20 14:14:43 +01:00
Brecht Van Lommel
35b1e9fc3a Cycles: pointcloud rendering
This add support for rendering of the point cloud object in Blender, as a native
geometry type in Cycles that is more memory and time efficient than instancing
sphere meshes. This can be useful for rendering sand, water splashes, particles,
motion graphics, etc.

Points are currently always rendered as spheres, with backface culling. More
shapes are likely to be added later, but this is the most important one and can
be customized with shaders.

For CPU rendering the Embree primitive is used, for GPU there is our own
intersection code. Motion blur is suppored. Volumes inside points are not
currently supported.

Implemented with help from:
* Kévin Dietrich: Alembic procedural integration
* Patrick Mourse: OptiX integration
* Josh Whelchel: update for cycles-x changes

Ref T92573

Differential Revision: https://developer.blender.org/D9887
2021-12-16 20:54:04 +01:00
Brecht Van Lommel
4fac3be146 Fix Cycles OptiX doing a bit too much work for almost opaque curve shadows
Found in D13353, likely has no significant impact in performance.
2021-11-29 18:41:37 +01:00
Michael Jones
f613c4c095 Cycles: MetalRT support (kernel side)
This patch adds MetalRT support to Cycles kernel code. It is mostly additive in nature or confined to Metal-specific code, however there are a few areas where this interacts with other code:

- MetalRT closely follows the Optix implementation, and in some cases (notably handling of transforms) it makes sense to extend Optix special-casing to MetalRT. For these generalisations we now have `__KERNEL_GPU_RAYTRACING__` instead of `__KERNEL_OPTIX__`.
- MetalRT doesn't support primitive offsetting (as with `primitiveIndexOffset` in Optix), so we define and populate a new kernel texture, `__object_prim_offset`, containing per-object primitive / curve-segment offsets. This is referenced and applied in MetalRT intersection handlers.
- Two new BVH layout enum values have been added: `BVH_LAYOUT_METAL` and `BVH_LAYOUT_MULTI_METAL_EMBREE` for XPU mode). Some host-side enum case handling has been updated where it is trivial to do so.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13353
2021-11-29 15:20:26 +00:00
Patrick Mours
7a97e925fd Cycles: Add support for building with OptiX 7.4 SDK and use built-in catmull-rom curve type
Some enum names were changed/removed in OptiX 7.4, so some changes are necessary to
make things compile still.
In addition, OptiX 7.4 also adds built-in support for catmull-rom curves, so it is no longer
necessary to convert the catmull-rom data to cubic bsplines first, and has endcaps disabled
by default now, so can remove the special handling via any-hit programs that filtered them
out before.

Differential Revision: https://developer.blender.org/D13351
2021-11-24 16:33:04 +01:00
Brecht Van Lommel
1b94c53aa6 Cleanup: fix typos in comments and docs
Contributed by luzpaz.

Differential Revision: https://developer.blender.org/D10447
2021-11-19 13:02:16 +01:00
Michael Jones
d1f944c186 Cycles: declare constants at program scope on Metal
MSL requires that constant address space literals be declared at program
scope. This patch moves the `blackbody_table_r/g/b` and `cie_colour_match`
constants into separate files so they can be declared at the appropriate scope.

Ref T92212

Differential Revision: https://developer.blender.org/D13241
2021-11-18 14:38:05 +01:00
Michael Jones
d19e35873f Cycles: several small fixes and additions for MSL
This patch contains many small leftover fixes and additions that are
required for Metal-enablement:

- Address space fixes and a few other small compile fixes
- Addition of missing functionality to the Metal adapter headers
- Addition of various scattered `__KERNEL_METAL__` blocks (e.g. for
  atomic support & maths functions)

Ref T92212

Differential Revision: https://developer.blender.org/D13263
2021-11-18 14:38:02 +01:00
Brecht Van Lommel
063ad8635e Cycles: reduce triangle memory usage with packed_float3
Depends on D13243

Differential Revision: https://developer.blender.org/D13244
2021-11-17 17:29:41 +01:00
Brecht Van Lommel
9937d5379c Cycles: add packed_float3 type for storage
Introduce a packed_float3 type for smaller storage that is exactly 3
floats, instead of 4. For computation float3 is still used since it can
use SIMD instructions.

Ref T92212

Differential Revision: https://developer.blender.org/D13243
2021-11-17 17:29:41 +01:00
Sergey Sharybin
ce395c84a3 Merge branch 'blender-v3.0-release' 2021-11-11 15:29:35 +01:00
Sergey Sharybin
d26d3cfe19 Fix T92868: Cycles catcher with transparency crashes
The issue was caused by splitting happening twice.

Fixed by checking for split flag which is assigned to the both states
during split.

The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.

Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.

Differential Revision: https://developer.blender.org/D13177
2021-11-11 15:21:35 +01:00
Michael Jones
3a4c8f406a Cycles: Adapt shared kernel/device/gpu layer for MSL
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.

In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.

MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.

Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.

A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13109
2021-11-09 21:43:10 +00:00
Brecht Van Lommel
fd25e883e2 Cycles: remove prefix from source code file names
Remove prefix of filenames that is the same as the folder name. This used
to help when #includes were using individual files, but now they are always
relative to the cycles root directory and so the prefixes are redundant.

For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
2021-10-26 15:37:04 +02:00
Brecht Van Lommel
d7d40745fa Cycles: changes to source code folders structure
* Split render/ into scene/ and session/. The scene/ folder now contains the
  scene and its nodes. The session/ folder contains the render session and
  associated data structures like drivers and render buffers.
* Move top level kernel headers into new folders kernel/camera/, kernel/film/,
  kernel/light/, kernel/sample/, kernel/util/
* Move integrator related kernel headers into kernel/integrator/
* Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/

For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
2021-10-26 15:36:39 +02:00
Brecht Van Lommel
282516e53e Cleanup: refactor float/half conversions for clarity 2021-10-22 13:03:03 +02:00
Brecht Van Lommel
fd77a28031 Cycles: bake transparent shadows for hair
These transparent shadows can be expansive to evaluate. Especially on the
GPU they can lead to poor occupancy when only some pixels require many kernel
launches to trace and evaluate many layers of transparency.

Baked transparency allows tracing a single ray in many cases by accumulating
the throughput directly in the intersection program without recording hits
or evaluating shaders. Transparency is baked at curve vertices and
interpolated, for most shaders this will look practically the same as actual
shader evaluation.

Fixes T91428, performance regression with spring demo file due to transparent
hair, and makes it render significantly faster than Blender 2.93.

Differential Revision: https://developer.blender.org/D12880
2021-10-19 15:11:09 +02:00
Brecht Van Lommel
d06828f0b8 Cycles: avoid intermediate stack array for writing shadow intersections
Helps save one OptiX payload and is a bit more efficient.

Differential Revision: https://developer.blender.org/D12909
2021-10-19 15:10:55 +02:00
Brecht Van Lommel
1df3b51988 Cycles: replace integrator state argument macros
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
  that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
  replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros

In preparation for decoupling main and shadow paths.

Differential Revision: https://developer.blender.org/D12888
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
5d565062ed Cleanup: refactor OptiX shadow intersection for upcoming changes 2021-10-15 15:42:44 +02:00
Brecht Van Lommel
eb71157e2a Cleanup: add utility functions for packing integers 2021-10-15 15:42:44 +02:00
Michael Jones
a0f269f682 Cycles: Kernel address space changes for MSL
This is the first of a sequence of changes to support compiling Cycles kernels as MSL (Metal Shading Language) in preparation for a Metal GPU device implementation.

MSL requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.

The vast majority of deltas in this patch fall into one of two cases:

- Ensuring ccl_private is specified for thread-local pointer types
- Ensuring ccl_global is specified for device-wide pointer types

Additionally, the ccl_addr_space qualifier can be removed. Prior to Cycles X, ccl_addr_space was used as a context-dependent address space qualifier, but now it is either redundant (e.g. in struct typedefs), or can be replaced by ccl_global in the case of pointer types. Associated function variants (e.g. lcg_step_float_addrspace) are also redundant.

In cases where address space qualifiers are chained with "const", this patch places the address space qualifier first. The rationale for this is that the choice of address space is likely to have the greater impact on runtime performance and overall architecture.

The final part of this patch is the addition of a metal/compat.h header. This is partially complete and will be extended in future patches, paving the way for the full Metal implementation.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D12864
2021-10-14 16:14:43 +01:00
Brecht Van Lommel
04857cc8ef Cycles: fully decouple triangle and curve primitive storage from BVH2
Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.

Now decouple triangle and curve primitive storage from BVH2.
* Reduced peak memory usage on all devices
* Bit better performance for OptiX and Embree
* Bit worse performance for CUDA
* Simplified code:
** Intersection.prim/object now matches ShaderData.prim/object
** No more offset manipulation for mesh displacement before a BVH is built
** Remove primitive packing code and flags for Embree and OptiX
** Curve segments are now stored in a KernelCurve struct
* Also happens to fix a bug in baking with incorrect prim/object

Fixes T91968, T91770, T91902

Differential Revision: https://developer.blender.org/D12766
2021-10-06 17:52:04 +02:00
Patrick Mours
2189dfd6e2 Cycles: Rework OptiX visibility flags handling
Before the visibility test against the visibility flags was performed in an any-hit program in OptiX
(called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array.
This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility
flags of all objects that reference that primitive, so if one object uses different visibility flags
than another object, but they both are instances of the same geometry, they would appear the same
way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is
that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles
visibility flags (12 bits).

To mostly fix the problem with multiple instances and different visibility flags, I changed things to
use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA`
to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit
program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in
curves from OptiX, since the any-hit program is then also used to skip the curve endcaps).

This may also improve performance in some cases, since by default OptiX can now perform the normal
scene intersection trace calls entirely on RT cores without having to jump back to the SM on every
hit to execute the any-hit program.

Fixes T89801

Differential Revision: https://developer.blender.org/D12604
2021-09-27 17:12:43 +02:00
Campbell Barton
4d66cbd140 Cleanup: spelling in comments 2021-09-22 14:54:01 +10:00
Brecht Van Lommel
0803119725 Cycles: merge of cycles-x branch, a major update to the renderer
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.

Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.

Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles

Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)

For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.

Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
2021-09-21 14:55:54 +02:00