Add a safe version of normalize since all uses of normalize
did zero length checks, move this into a function.
Also avoid unnecessary conversion.
Gives minor speedup here (approx 3-5%).
If there was any specularity in the Principled BSDF, it would get a sampling
weight of one regardless of its actual impact.
This commit makes Cycles estimate the contribution of the component and adjust
the weighting accordingly, which greatly improves the noise characteristics of
the Principled BSDF in many cases.
Note that this commit might slightly change the brightness of areas when using
MultiGGX and high roughnesses, but the new brightness is more accurate and
closer to the result of Branched Path Tracing. See T51836 for details.
Differential Revision: https://developer.blender.org/D2677
The PDF of the MultiGGX sampling is approximated by the singlescattering GGX
term as well as a scaled diffuse term that makes up for the energy in the
multiscattering component that's missed by GGX.
However, there were two problems with the glossy terms: The diffuse term missed
a normalization factor, and the singlescattering term was not properly scaled
down based on the albedo estimate.
The glass term was completely wrong and has been rewritten. It uses the fresnel
factor to weight reflection vs. refraction and uses the glossy MultiGGX model
for reflection.
For refraction, the correct singlescattering term is now used, and a new
albedo approximation is used that was derived by evaluating GGX albedo for
roughnesses from 0 to 1 and IORs from 1 to 3 and fitting numerical
approximations to it. The resulting model has a mean relative error of 9e-5,
but could probably be simplified without losing noticable accuracy in the
final render.
The improved PDFs help with glossy highlights (due to better light sampling vs.
closure sampling MIS) and fix the situation described in T51836 where mixing
MultiGGX with other closures (as it happens in e.g. the Principled
BSDF) causes incorrect darkening.
< Dependency graph Copy-on-Write >
--------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
This is an initial commit of Copy-on-write support added to dependency graph.
Main priority for now: get playback (Alt-A) and all operators (selection,
transform etc) to work with the new concept of clear separation between
evaluated data coming from dependency graph and original data coming from
.blend file (and stored in bmain).
= How does this work? =
The idea is to support Copy-on-Write on the ID level. This means, we duplicate
the whole ID before we cann it's evaluaiton function. This is currently done
in the following way:
- At the depsgraph construction time we create "shallow" copy of the ID
datablock, just so we know it's pointer in memory and can use for function
bindings.
- At the evaluaiton time, the copy of ID get's "expanded" (needs a better
name internally, so it does not conflict with expanding datablocks during
library linking), which means the content of the datablock is being
copied over and all IDs are getting remapped to the copied ones.
Currently we do the whole copy, in the future we will support some tricks
here to prevent duplicating geometry arrays (verts, edges, loops, faces
and polys) when we don't need that.
- Evaluation functions are operating on copied datablocks and never touching
original datablock.
- There are some cases when we need to know non-ID pointers for function
bindings. This mainly applies to scene collections and armatures. The
idea of dealing with this is to "expand" copy-on-write datablock at
the dependency graph build time. This might introduce some slowdown to the
dependency graph construction time, but allows us to have minimal changes
in the code and avoid any hash look-up from evaluation function (one of
the ideas to avoid using pointers as function bindings is to pass name
of layer or a bone to the evaluation function and look up actual data based
on that name).
Currently there is a special function in depsgraph which does such a
synchronization, in the future we might want to make it more generic.
At some point we need to synchronize copy-on-write version of datablock with
the original version. This happens, i.e., when we change active object or
change selection. We don't want any actual evaluation of update flush happening
for such thins, so now we have a special update tag:
DEG_id_tag_update((id, DEG_TAG_COPY_ON_WRITE)
- For the render engines we now have special call for the dependency graph to
give evaluated datablock for the given original one. This isn't fully ideal
but allows to have Cycles viewport render.
This is definitely a subject for further investigation / improvement.
This call will tag copy-on-write component tagged for update without causing
updates to be flushed to any other objects, causing chain reaction of updates.
This tag is handy when selection in the scene changes.
This basically summarizes ideas underneath this commit. The code should be
reasonably documented.
Here is a demo of dependency graph with all copy-on-write stuff in it:
https://developer.blender.org/F635468
= What to expect to (not) work? =
- Only meshes are properly-ish aware of copy-on-write currently, Non-mesh
geometry will probably crash or will not work at all.
- Armatures will need similar depsgraph built-time expansion of the copied
datablock.
- There are some extra tags / relations added, to keep things demo-able but
which are slowing things down for evaluation.
- Edit mode works for until click selection is used (due to the selection
code using EditDerivedMesh created ad-hoc).
- Lots of tools will lack tagging synchronization of copied datablock for
sync with original ID.
= How to move forward? =
There is some tedious work related on going over all the tools, checking
whether they need to work with original or final evaluated object and make
the required changes.
Additionally, there need synchronization tag done in fair amount of tools
and operators as well. For example, currently it's not possible to change
render engine without re-opening the file or forcing dependency graph for
re-build via python console.
There is also now some thoughts required about copying evaluated properties
between objects or from collection to a new object. Perhaps easiest way
would be to move base flag flush to Object ID node and tag new objects for
update instead of doing manual copy.
here is some WIP patch which moves such evaluaiton / flush:
https://developer.blender.org/F635479
Lots of TODOs in the code, with possible optimization.
= How to test? =
This is a feature under heavy development, so obviously it is disabled by
default. The only reason it goes to 2.8 branch is to avoid possible merge
hell.
In order to enable this feature use WITH_DEPSGRAPH_COPY_ON_WRITE CMake
configuration option.
The crash did not happen yet because we always had proper vmemh defined in
the parent scope.
Patch by Ivan Ivanov (aka obiwanus), thanks!
Differential Revision: https://developer.blender.org/D2715
This is not enough to mutex-guard modification code of integer values,
since this operation is NOT atomic. This is not even safe for a single
byte data types.
For now guarded the getter functions, similar to other functions in
this module.
Ideally we want to switch modification to an atomic operations, so we
wouldn't need any locks in the getters.
Was some mismatch in address space. Seems to be caused by recent additions.
Additionally, moved decoupled ray marching functions under ifdef, so they
don't try to use malloc() functions.
Thanks Mai for testing the patch!
Now, when there is no usable neighboring pixel for denoising, the noisy value
is preserved instead of producing a NaN.
Also, negative results are clamped to zero.
Note that there are just workarounds that don't fix the underlying problems,
but these issues are very rare and I'm not sure if it's even possible to fix
the underlying problems without introducing a significant slowdown or quality
decrease in other situations.
Because of that and since 2.79 is happening very soon, I just went for these
workarounds for now.
Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.
Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.
Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.
Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
Due to various driver issues with AMD GCN 1 cards we can no longer support
these GPUs. This patch makes them unavailable to select for Cycles rendering.
GCN cards 2 and higher are still supported. Please use the most recent
drivers available to ensure proper functionality.
See here for a list to check which GPUs are supported:
https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
The previous outlier heuristic only checked whether the pixel is more than
twice as bright compared to the 75% quantile of the 5x5 neighborhood.
While this detected fireflies robustly, it also incorrectly marked a lot of
legitimate small highlights as outliers and filtered them away.
This commit adds an additional condition for marking a pixel as a firefly:
In addition to being above the reference brightness, the lower end of the
3-sigma confidence interval has to be below it.
Since the lower end approximates how low the true value of the pixel might be,
this test separates pixels that are supposed to be very bright from pixels that
are very bright due to random fireflies.
Also, since there is now a reliable outlier filter as a preprocessing step,
the additional confidence interval test in the reconstruction kernel is no
longer needed.
UNIFORM_NONE should never match a valid uniform (builtin or custom).
The logic for UNIFORM_CUSTOM was just wrong, since it returned the first custom uniform. This function should only accept builtin (non-custom) uniforms.
Quick hash rejection instead of string comparison. Uniform lookups already work this way. I don't expect a major overall speedup since attributes are looked up less frequently than uniforms.
The issue was caused by usage of address of dupli-object (which will vary
from iteration process to iteration process) as something denoting whether
we've got the data synchronized to Cycles or not.
For now solved by using address of original object (the one DupliObject
points to) as a pointer for the map.
Need to do more thoughts about this.
This commit extends the work from Dalai made around scene iterators to
support iterating into objects from dupli-lists.
Changes can be summarized as:
- Depsgraph iterator will hold pointer to an object which created current
duplilist. It is available via `dupli_parent` field of the iterator.
It is only set when duplilist is not NULL and guaranteed to be NULL
for all other cases.
- Introduced new depsgraph.duplis collection which gives a more extended
information about depsgraph iterator. It is basically a collection on top
of DEGObjectsIteratorData.
It is used to provide access to such data as persistent ID, generated space
and so on.
Things which still needs to be done/finished/clarified:
- Need to introduce some sort of `is_instance` boolean property which will
indicate Python and C++ RNA that we are inside of dupli-list.
- Introduce a way to skip dupli-list for particular objects.
So, for example, if we are culling object due to distance we can skip all
objects it was duplicating.
- Introduce a way to skip particular duplicators.
So we can skip iterating into particle system.
- Introduce some cleaner API for C side of operators to access all data such as
persistent ID and friends.
This way we wouldn't need de-reference iterator and could keep access to such
data really abstract. Who knows how we'll be storing internal state of the
operator in the future.
While there is still stuff to do, current state works and moves us in the proper
direction.
Before this change Gawain was doing list lookup twice,
doing string comparison of every and each input which
is not efficient and not friendly for CPUs with small
cache size.
Now we store hash of input name together with actual
name and compare hashes first. Additionally, we do
everything in a single pass which is much better from
cache coherency point of view.
This brings Eevee cache population time from 80ms to
60ms on my desktop and from 800ms to 400ms for Clement
when navigating in a file from T50027.
Reviewers: merwin, dfelinto
Subscribers: fclem
Differential Revision: https://developer.blender.org/D2697