This is not enough to mutex-guard modification code of integer values,
since this operation is NOT atomic. This is not even safe for a single
byte data types.
For now guarded the getter functions, similar to other functions in
this module.
Ideally we want to switch modification to an atomic operations, so we
wouldn't need any locks in the getters.
Was some mismatch in address space. Seems to be caused by recent additions.
Additionally, moved decoupled ray marching functions under ifdef, so they
don't try to use malloc() functions.
Thanks Mai for testing the patch!
Now, when there is no usable neighboring pixel for denoising, the noisy value
is preserved instead of producing a NaN.
Also, negative results are clamped to zero.
Note that there are just workarounds that don't fix the underlying problems,
but these issues are very rare and I'm not sure if it's even possible to fix
the underlying problems without introducing a significant slowdown or quality
decrease in other situations.
Because of that and since 2.79 is happening very soon, I just went for these
workarounds for now.
Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.
Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.
Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.
Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
Due to various driver issues with AMD GCN 1 cards we can no longer support
these GPUs. This patch makes them unavailable to select for Cycles rendering.
GCN cards 2 and higher are still supported. Please use the most recent
drivers available to ensure proper functionality.
See here for a list to check which GPUs are supported:
https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
The previous outlier heuristic only checked whether the pixel is more than
twice as bright compared to the 75% quantile of the 5x5 neighborhood.
While this detected fireflies robustly, it also incorrectly marked a lot of
legitimate small highlights as outliers and filtered them away.
This commit adds an additional condition for marking a pixel as a firefly:
In addition to being above the reference brightness, the lower end of the
3-sigma confidence interval has to be below it.
Since the lower end approximates how low the true value of the pixel might be,
this test separates pixels that are supposed to be very bright from pixels that
are very bright due to random fireflies.
Also, since there is now a reliable outlier filter as a preprocessing step,
the additional confidence interval test in the reconstruction kernel is no
longer needed.
UNIFORM_NONE should never match a valid uniform (builtin or custom).
The logic for UNIFORM_CUSTOM was just wrong, since it returned the first custom uniform. This function should only accept builtin (non-custom) uniforms.
Quick hash rejection instead of string comparison. Uniform lookups already work this way. I don't expect a major overall speedup since attributes are looked up less frequently than uniforms.
The issue was caused by usage of address of dupli-object (which will vary
from iteration process to iteration process) as something denoting whether
we've got the data synchronized to Cycles or not.
For now solved by using address of original object (the one DupliObject
points to) as a pointer for the map.
Need to do more thoughts about this.
This commit extends the work from Dalai made around scene iterators to
support iterating into objects from dupli-lists.
Changes can be summarized as:
- Depsgraph iterator will hold pointer to an object which created current
duplilist. It is available via `dupli_parent` field of the iterator.
It is only set when duplilist is not NULL and guaranteed to be NULL
for all other cases.
- Introduced new depsgraph.duplis collection which gives a more extended
information about depsgraph iterator. It is basically a collection on top
of DEGObjectsIteratorData.
It is used to provide access to such data as persistent ID, generated space
and so on.
Things which still needs to be done/finished/clarified:
- Need to introduce some sort of `is_instance` boolean property which will
indicate Python and C++ RNA that we are inside of dupli-list.
- Introduce a way to skip dupli-list for particular objects.
So, for example, if we are culling object due to distance we can skip all
objects it was duplicating.
- Introduce a way to skip particular duplicators.
So we can skip iterating into particle system.
- Introduce some cleaner API for C side of operators to access all data such as
persistent ID and friends.
This way we wouldn't need de-reference iterator and could keep access to such
data really abstract. Who knows how we'll be storing internal state of the
operator in the future.
While there is still stuff to do, current state works and moves us in the proper
direction.
Before this change Gawain was doing list lookup twice,
doing string comparison of every and each input which
is not efficient and not friendly for CPUs with small
cache size.
Now we store hash of input name together with actual
name and compare hashes first. Additionally, we do
everything in a single pass which is much better from
cache coherency point of view.
This brings Eevee cache population time from 80ms to
60ms on my desktop and from 800ms to 400ms for Clement
when navigating in a file from T50027.
Reviewers: merwin, dfelinto
Subscribers: fclem
Differential Revision: https://developer.blender.org/D2697
Denoising was setting session parameters for every frame, which was detected as
a change and therefore caused a resync.
Since the parameter modification change is only needed for viewport rendering
(which doesn't support denoising anyways) and resyncing after a frame change
(which isn't affected by denoising settings), an easy fix is to just ignore
the denoising parameters like it's currently done with the samples.
Follow up to 9f044cb422
These comments described the difference between Microsoft & MinGW's struct definition. Now that we dropped MinGW we don't need to go into these details.