- Connectivity length was overwritten by distance to closest selected.
- Vertices used the 'island' center of the closest vertex,
even if it wasn't connected.
Now optionally keep track of the original index of used as the closest
connected distance.
To support this needed to add optional support for islands of 1 vertex.
The changes introduced in rB3e628eefa9f55fac7b0faaec4fd4392c2de6b20e
made the non-subframe frame change behaviour less intuitive, by always
truncating downwards, instead of rounding to the nearest frame instead.
This made the UI a lot less forgiving of pointing precision errors
(for example, as a result of hand shake, or using a tablet on a highres scren)
This commit restores the old behaviour in this case only (subframe inspection
isn't affected by these changes)
Selection loop would draw the selection ignoring xray.
Now draw in a separate pass after clearing the depth buffer,
as with regular drawing.
Also disable depth sorting,
caller can sort the hit-list by depth if needed.
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
Reduces memory allocation for split kernel.
This allows for faster rendering due to bigger global size,
specially when GPU memory is limited.
Perfromance results:
R9 290 total render time
Before After Change
BMW 4:37 4:34 -1.1 %
Classroom 14:43 14:30 -1.5 %
Fishy Cat 11:20 11:04 -2.4 %
Koro 12:11 12:04 -1.0 %
Pabellon Barcelona 22:01 20:44 -5.8 %
Pabellon Barcelona(*) 15:32 15:09 -2.5 %
(*) without glossy connected to volume
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
Intended to replace legacy GL_SELECT, without the limitations of
sample queries which can't access depth information.
This commit adds VIEW3D_SELECT_PICK_NEAREST and VIEW3D_SELECT_PICK_ALL
which access the depth buffers to detect whats under the pointer,
so initial selection is always the closest item.
The performance of this method depends a lot on the OpenGL
implementations glReadPixels.
Since reading depth can be slow, buffers are cached for object picking
so selecting re-uses depth data, performing 1 draw instead of 3
(for 24, 18, 10 px regions, picking with many items under the pointer).
Occlusion queries draw twice when picking nearest,
so worst case 6x draw calls per selection.
Even with these improvements occlusion queries is faster on AMD hardware.
Depth selection is disabled by default, toggle option under select method.
May enable by default if this works well on different hardware.
Reviewed as D2543
The issue was caused by sometimes negative color returned by the filter node.
Seems to be caused by precision issues. Don't see any reason why we would want
negative colors in output. Those only causing issues later on.
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.
Will also be a little faster in some cases because of larger global work size.
Because the split kernel can render multiple samples in parallel it is
necessary to have everything initialized before rendering of any samples
begins. The code that normally handles initialization of
`rng_state` (`kernel_path_trace_setup()`) only does so for the first sample,
which was causing artifacts in the split kernel due to uninitialized
`rng_state` for some samples.
Note that because the split kernel can render samples in parallel this
means that the split kernel is incompatible with the LCG.
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.
Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.
Simple workaround for some issues we've been having with AMD drivers hanging
and rendering systems unresponsive. Unfortunately this makes things a bit
slower, but its better than having to do hard reboots. Will be removed when
drivers have been fixed.
Define CYCLES_DISABLE_DRIVER_WORKAROUNDS to disable for testing purposes.
This does a few things at once:
- Refactors host side split kernel logic into a new device
agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
allows as many threads as will fit in memory regardless of tile size, which
can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
number of arguments passed to kernels. Means there's less code to deal
with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering