by doing the subtraction first and then take the dot product. This
offers higher precision than taking the dot product first and then
subtract the result, especially in the cases where the ray origin is
very far away from the sphere center.
Use the common BVH utilities header for this.
Added a special type qualifier ccl_ray_data which is defined to ccl_private
for all platforms but Metal. On Metal it is defined to ray_data.
The tricky part is that the BVH utilities are wrapped into the Metal context
class. In some of the BVH functions the context has been already constructed,
but it wasn't done in all the callbacks.
From a quick render tests of the Junkshop benchmark scene there is no render
time difference,
No functional changes are expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/111967
The Metal backend combines all kernel sources into a single string
and passes it to the compiler. Doing so natively has a disadvantage
of making it hard to find sources of mistakes in code when working
on the kernel source as the source file name and line number info
is lost. For example, the error would look like:
```
program_source:187004:3: error: use of undeclared identifier 'metalrt_intersection_point_shadow_force_error'
metalrt_intersection_point_shadow_force_error(launch_params_metal,
^
```
This patch makes it so the #line pragmas are inserted into the
combined source source code, so that the compiler errors are easier
to find immediately. For example the error from above becomes
```
source/kernel/device/metal/kernel.metal:809:3: error: use of undeclared identifier 'metalrt_intersection_point_shadow_force_error'
metalrt_intersection_point_shadow_force_error(launch_params_metal,
^
```
The code is a slightly modified functionality from the source
processing used for OpenCL in Blender 2.80.
Pull Request: https://projects.blender.org/blender/blender/pulls/111959
based on concentric disk mapping.
Concentric disk mapping was already present, but not used everywhere.
Now `sample_cos_hemisphere()`, `sample_uniform_hemisphere()`, and
`sample_uniform_cone()` use concentric disk mapping.
This changes the noise in many test images.
Pull Request: https://projects.blender.org/blender/blender/pulls/109774
In the commonly used cycles headers, it's enough to include
much smaller <iosfwd> than the full <iostream>. While looking at it,
removed inclusion of some other headers from commonly used headers,
that seemed to not be needed.
Pull Request: https://projects.blender.org/blender/blender/pulls/111063
Both the `Math` node and the `Vector Math` currently only explicitly
support modulo using truncated division which is oftentimes not the
type of modulo desired as it behaves differently for negative numbers
and positive numbers.
Floored Modulo can be created by either using the `Wrap` operation or
a combination of multiple `Math` nodes. However both methods obfuscate
the actual intend of the artist and the math operation that is actually
used.
This patch adds modulo using floored division to the scalar `Math` node,
explicitly stating the intended math operation and renames the already
existing `"Modulo"` operation to `"Truncated Modulo"` to avoid confusion.
Only the ui name is changed, so this should not break compatibility.
Pull Request: https://projects.blender.org/blender/blender/pulls/110728
This fixes the issue described in https://projects.blender.org/blender/blender/issues/108957.
Instead of modeling distant lights like a disk light at infinity, it models them as cones. This way, the radiance is constant across the entire range of directions that it covers.
For smaller angles, the difference is very subtle, but for very large angles it becomes obvious (here's the file from #108957, the angle is 179°):
| Old | New |
| - | - |
|  |  |
One notable detail is the sampling method: Using `sample_uniform_cone` can increase noise, since the sampling method no longer preserves the stratification of the samples. This is visible in the "light tree multi distant" test scene.
Turns out we can do better, and after a bit of testing I found a way to adapt the concentric Shirley mapping to uniform cone sampling. I hope the comment explains the logic behind it reasonably well.
Here's the result, note that even the noise distribution is the same when using the new sampling:
| Method | Old | New, basic sampling | New, concentric sampling |
| - | - |- | - |
| Image |  |  |  |
| Render time (at higher spp)| 9.03sec | 8.79sec | 8.96sec |
I'm not sure if I got the `light->normalized` handling right, since I don't really know what the expectation from Hydra is here.
Co-authored-by: Weizhen Huang <weizhen@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/108996
for energy preservation and better compatibility with other renderes. Ref: #108505
Point light now behaves the same as a spherical mesh light with the same overall energy (scaling from emission strength to power is \(4\pi^2R^2\)).
# Cycles
## Comparison
| Mesh Light | This patch | Previous behavior |
| -------- | -------- | -------- |
|  |  |  |
The behavior stays the same when `radius = 0`.
| This patch | Previous behavior |
| -------- | -------- |
|  |  |
No obvious performance change observed.
## Sampling
When shading point lies outside the sphere, sample the spanned solid angle uniformly.
When shading point lies inside the sphere, sample spherical direction uniformly when inside volume or the surface is transmissive, otherwise sample cosine-weighted upper hemisphere.
## Light Tree
When shading point lies outside the sphere, treat as a disk light spanning the same solid angle.
When shading point lies inside the sphere, it behaves like a background light, with estimated outgoing radiance
\[L_o=\int f_aL_i\cos\theta_i\mathrm{d}\omega_i=\int f_a\frac{E}{\pi r^2}\cos\theta_i\mathrm{d}\omega_i\approx f_a \frac{E}{r^2}\],
with \(f_a\) being the BSDF and \(E\) `measure.energy` in `light_tree.cpp`.
The importance calculation for `LIGHT_POINT` is
\[L_o=f_a E\cos\theta_i\frac{\cos\theta}{d^2}\].
Consider `min_importance = 0` because maximal incidence angle is \(\pi\), we could substitute \(d^2\) with \(\frac{r^2}{2}\) so the averaged outgoing radiance is \(f_a \frac{E}{r^2}\).
This only holds for non-transmissive surface, but should be fine to use in volume.
# EEVEE
When shading point lies outside the sphere, the sphere light is equivalent to a disk light spanning the same solid angle. The sine of the new half-angle is the tangent of the previous half-angle.
When shading point lies inside the sphere, integrating over the cosine-weighted hemisphere gives 1.0.
## Comparison with Cycles
The plane is diffuse, the blue sphere has specular component.
| Before | |After ||
|---|--|--|--|
|Cycles|EEVEE|Cycles|EEVEE|
|||||
Pull Request: https://projects.blender.org/blender/blender/pulls/108506
Fractal noise is the idea of evaluating the same noise function multiple times with
different input parameters on each layer and then mixing the results. The individual
layers are usually called octaves.
The number of layers is controlled with a "Detail" slider.
The "Lacunarity" input controls a factor by which each successive layer gets scaled.
The existing Noise node already supports fractal noise. Now the Voronoi Noise node
supports it as well. The node also has a new "Normalize" property that ensures that
the output values stay in a [0.0, 1.0] range. That is except for the F2 feature where
in rare cases the output may be outside that range even with "Normalize" turned on.
How the individual octaves are mixed depends on the feature and output socket:
- F1/Smooth F1/F2:
- Distance/Color output:
The individual Distance/Color octaves are first multiplied by a factor of
`Roughness ^ (#layers - 1.0)` then added together to create the final output.
- Position output:
Each Position octave gets linearly interpolated with the combined output of the
previous octaves. The Roughness input serves as an interpolation factor with
0.0 resutling in only using the combined output of the previous octaves and
1.0 resulting in only using the current highest octave.
- Distance to Edge:
- Distance output:
The Distance octaves are mixed exactly like the Position octaves for F1/Smooth F1/F2.
It should be noted that Voronoi Noise is a relatively slow noise function, especially
at higher dimensions. Increasing the "Detail" makes it even slower. Therefore, when
optimizing a scene one should consider trying to use simpler noise functions instead
of Voronoi if the final result is close enough.
Pull Request: https://projects.blender.org/blender/blender/pulls/106827
This patch removes a workaround for an issue that is now understood to be undefined behaviour (and fixed by #108176). It also adds two useful debug flags that we would like to be available in Blender 3.6.
Pull Request: https://projects.blender.org/blender/blender/pulls/108322
There were two issues here preventing the proper display of the IES
files in question.
The primary one was that these lights are actually vertical. Their
profiles actually point upwards from 90deg to 180deg but our parser was
trying hard to adjust it to start at 0deg incorrectly.
Lastly, the files in question ended with the parser in the `eof`
state - they are "missing" the final carriage return that other IES
files tend to have but other viewers don't seem to mind. Change the
`eof` check instead for a better one that will indicate if any parsing
errors occurred along the way.
Pull Request: https://projects.blender.org/blender/blender/pulls/107320
MSVC can't optimize it out and even keeps an external call to CRT
function rintf: https://godbolt.org/z/Ex9vjf8vj
It does translate to a real speedup on windows on some scenes, here are the ratios I had on my 13900K:
classroom 101.53%
junkshop 100.71%
monster 100.76%
attic 107.98%
bistro 113.00%
Pull Request: https://projects.blender.org/blender/blender/pulls/107371
Since the version v1.5.0 of sse2neon the functionality for denormals
flushing is implemented in the library. This commit makes it so the
_MM_SET_FLUSH_ZERO_MODE and _MM_SET_DENORMALS_ZERO_MODE are used from
the ss2neon if available.
This solves macro re-definition when a newer sse2neon is used.
The change is implemented in a way that both current and new sse2neon
are supported.
Updated Embree 4 library with GPU support is required for it to be
compiled - compatiblity with Embree 3 and Embree 4 without GPU support
is maintained.
Enabling hardware raytracing is an opt-in user setting for now.
Pull Request: https://projects.blender.org/blender/blender/pulls/106266
Use raw Blender structs and mesh data rather than using the RNA API.
There isn't any benefit from using the RNA when Cycles is compiled
with Blender anyway, and a profile showed that the majority of time
was spent in Blender RNA API functions.
This gives a significant improvement in performance when ingesting
meshes. Here are some tests of the runtime of the `create_mesh`
function (in seconds):
| | Before | After |
| ------------------------- | ------ | ----- |
| Grid | 0.66 | 0.11 |
| Many realized cubes | 2.60 | 0.48 |
| Large curve to mesh setup | 4.18 | 1.14 |
Also change to resizing the arrays and filling them by index rather
than appending. This makes the parallel aspect of the logic clearer,
and makes the loops easier to parallelize in the future, and makes
it easier to have a performance benefit when an attribute like
`sharp_face` doesn't exist.
Pull Request: https://projects.blender.org/blender/blender/pulls/106275
For example
```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```
becomes
```
OIIOOutputDriver::~OIIOOutputDriver() {}
```
Saves quite some vertical space, which is especially handy for
constructors.
Pull Request: https://projects.blender.org/blender/blender/pulls/105594
This commit replaces the current Glass approach, where Glass is a virtual closure
that gets replaced with a Glossy and a Refractive closure, with a combined
closure that handles Fresnel after sampling the microfacet. That way, the Fresnel
term is more accurate since it accounts for the microfacet normal, not the
shading normal.
Also updates the BSDF sampling to use a 3D sampler now, since we need two
dimensions to pick the microfacet normal and then a third dimension to pick
reflection/refraction. This can also be used to get rid of the LCG in the
Principled Hair BSDF, which means we can remove it altogether once MultiGGX is
gone.
Also, "sharp" is now supported as a microfacet distribution in OSL, and 2
is supported as the "refract" argument to microfacet() in order to get glass.
To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered.
The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine.
Pull Request #105173