Update samples displayign in the interface once in a whiel only
(currently once in 1 sec). This still keeps rendering interactive
enough for artists and it eliminates slowdown caused by constant
uploading sampels from GPU to host.
This also allowed to get rid of hack with storing render result
in render buffers, due to it's not so big overhead ow to create
render result when sample needs to be updated.
Tests made with different files shows that now render speed is
really close to oroginal speed from before we started hackign
into this stuff.
There still some issues with interactive update when rendering
several render layers, but that seems to be irrelivant to all
this sampels commtis, so woudl look into this as a separate
patch.
Was requested by Mango team to improve feedback during rendering.
Known issues:
- Updating of samples are accumulative, meaning that visually samples
would be dark in the beginning becoming brighter during progress.
- Could give some % of slowdown, so probably should be disabled in
background mode.
Still to come: update of samples when using CUDA and OpenCL.
Seems this requred cuda context synchronization after every finished sample,
which could give few percent of slowdown. In test made here it was only minor
slowdown, so think it's pretty much acceptable for now.
save memory during render and cache render results.
Implementation notes:
In the render engine API it's now possible to get the render result for
one render layer only, and retrieve the expected tile size in case save
buffers is used. This is needed because EXR expects tiles with particular
size and coordinates.
The EXR temporary files are now also separated per layer, since Cycles
can't give the full render result for all render layers, and EXR doesn't
support writing parts of tiles.
In Cycles internally the handling of render buffers and multi GPU
rendering in particular changed quite a bit, and could use a bit more
refactoring to make things more consistent and simple.
other places, was mainly due to instancing not working, but also found
issues in procedural textures.
The problem was with --use_fast_math, this seems to now have way lower
precision for some operations. Disabled this flag and selectively use
fast math functions. Did not find performance regression on GTX 460 after
doing this.
* Multithreaded image loading, each thread can load a separate image.
* Better multithreading for multiple instanced meshes, different threads can now
build BVH's for different meshes, rather than all cooperating on the same mesh.
Especially noticeable for dynamic BVH building for the viewport, gave about
2x faster build on 8 core in fairly complex scene with many objects.
* The main thread waiting for worker threads can now also work itself, so
(num_cores + 1) threads will be working, this supposedly gives better
performance on some operating systems, but did not measure performance for
this very detailed yet.
=== BVH build time optimizations ===
* BVH building was multithreaded. Not all building is multithreaded, packing
and the initial bounding/splitting is still single threaded, but recursive
splitting is, which was the main bottleneck.
* Object splitting now uses binning rather than sorting of all elements, using
code from the Embree raytracer from Intel.
http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/
* Other small changes to avoid allocations, pack memory more tightly, avoid
some unnecessary operations, ...
These optimizations do not work yet when Spatial Splits are enabled, for that
more work is needed. There's also other optimizations still needed, in
particular for the case of many low poly objects, the packing step and node
memory allocation.
BVH raytracing time should remain about the same, but BVH build time should be
significantly reduced, test here show speedup of about 5x to 10x on a dual core
and 5x to 25x on an 8-core machine, depending on the scene.
=== Threads ===
Centralized task scheduler for multithreading, which is basically the
CPU device threading code wrapped into something reusable.
Basic idea is that there is a single TaskScheduler that keeps a pool of threads,
one for each core. Other places in the code can then create a TaskPool that they
can drop Tasks in to be executed by the scheduler, and wait for them to complete
or cancel them early.
=== Normal ====
Added a Normal output to the texture coordinate node. This currently
gives the object space normal, which is the same under object animation.
In the future this might become a "generated" normal so it's also stable for
deforming objects, but for now it's already useful for non-deforming objects.
=== Render Layers ===
Per render layer Samples control, leaving it to 0 will use the common scene
setting.
Environment pass will now render environment even if film is set to transparent.
Exclude Layers" added. Scene layers (all object that influence the render,
directly or indirectly) are shared between all render layers. However sometimes
it's useful to leave out some object influence for a particular render layer.
That's what this option allows you to do.
=== Filter Glossy ===
When using a value higher than 0.0, this will blur glossy reflections after
blurry bounces, to reduce noise at the cost of accuracy. 1.0 is a good
starting value to tweak.
Some light paths have a low probability of being found while contributing much
light to the pixel. As a result these light paths will be found in some pixels
and not in others, causing fireflies. An example of such a difficult path might
be a small light that is causing a small specular highlight on a sharp glossy
material, which we are seeing through a rough glossy material. With path tracing
it is difficult to find the specular highlight, but if we increase the roughness
on the material the highlight gets bigger and softer, and so easier to find.
Often this blurring will be hardly noticeable, because we are seeing it through
a blurry material anyway, but there are also cases where this will lead to a
loss of detail in lighting.
* Reverted the general activation of __KERNEL_SHADING__.
Better to handle this in the device file. This way each platform gets specifically what it is capable of atm.
* Nvidia has Shading + Multi Closure
* AMD (Apple) has only Clay Render
* AMD (non Apple) has Basic Shading
* Enable __KERNEL_SHADING__ per default for OpenCL.
This enables basic shading (color, emission, textures...) for AMD cards.
You need the latest AMD catalyst driver in order to have this work.
Currently supported passes:
* Combined, Z, Normal, Object Index, Material Index, Emission, Environment,
Diffuse/Glossy/Transmission x Direct/Indirect/Color
Not supported yet:
* UV, Vector, Mist
Only enabled for CPU devices at the moment, will do GPU tweaks tommorrow,
also for environment importance sampling.
Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Passes
By default lighting from the world is computed solely with indirect light
sampling. However for more complex environment maps this can be too noisy, as
sampling the BSDF may not easily find the highlights in the environment map
image. By enabling this option, the world background will be sampled as a lamp,
with lighter parts automatically given more samples.
Map Resolution specifies the size of the importance map (res x res). Before
rendering starts, an importance map is generated by "baking" a grayscale image
from the world shader. This will then be used to determine which parts of the
background are light and so should receive more samples than darker parts.
Higher resolutions will result in more accurate sampling but take more setup
time and memory.
Patch by Mike Farnsworth, thanks!
The rendering device is now set in User Preferences > System, where you can
choose between OpenCL/CUDA and devices. Per scene you can then still choose
to use CPU or GPU rendering.
Load balancing still needs to be improved, now it just splits the entire
render in two, that will be done in a separate commit.
lower than 1.3, since we're not officially supporting these. We're already not
providing CUDA binaries for these, so better make it clear when compiling from
source too.
* Compile all of cycles with -ffast-math again
* Add scons compilation of cuda binaries, tested on mac/linux.
* Add UI option for supported/experimental features, to make it
more clear what is supported, opencl/subdivision is experimental.
* Remove cycles xml exporter, was just for testing.
Some drivers don't support passing include paths with spaces in them, nor does
the opencl spec specify anything about how to quote/escape such paths, so for
now we just resolved #includes ourselves. Alternative would have been to use c
preprocessor, but this also resolves all #ifdefs, which we do not want.
* Reduce kernel arguments size, helps compile for apple nvidia.
* Fix use of unitialized variable in displace kernel.
* Use build flags in opencl kernel md5 hash.
* Reorganize code for kernel feature #defines a bit.
* Enable OpenCL Full Shading on NVIDIA cards.
Notes:
It makes not much sense to use OpenCL on a nVidia card (as it is slower compared to CUDA), but as OpenCL comes without dependencies, it's an good alternative if you don't want to install the CUDA toolkit or the build comes without CUDA kernels.
* Add back option to bundle CUDA kernel binaries with builds.
* Disable runtime CUDA kernel compilation on Windows, couldn't get this working,
since it seems to depend on visual studio being installed, even though for
this particular case it shouldn't be needed. CMake only at the moment.
* Runtime compilation on linux/mac should now work if nvcc is not installed in
the default location, but available in PATH.
The __imp__ prefix on glew lib linking errors should have been a good indication: the code was looking for the glew dll.
Bypassed by adding GLEW_STATIC to the definitions.
* It seems we have a problem compiling the CUDA kernel at runtime on Windows,
will need to investigate more how to solve this best, CPU render should go
fine though.
* Change OPENIMAGEIO to OPENIMAGEIO_ROOT_DIR on linux for consistency.