test2

Author	SHA1	Message	Date
Martijn Berger	28c1a860e2	Fix T39247 Changes to interpolation break texture allocation on sm35 and greater.	2014-03-19 07:37:18 +01:00
Martijn Berger	dd2dca2f7e	Add support for multiple interpolation modes on cycles image textures All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic. This patch adds user control to this setting. Added: - bits to DNA / RNA in the form of an enum for supporting multiple interpolations types - changes to the image texture node drawing code ( add enum) - to ImageManager (this needs to know to allocate second texture when interpolation type is different) - to node compiler (pass on interpolation type) - to device tex_alloc this also needs to get the concept of multiple interpolation types - implementation for doing non interpolated lookup for cuda and cpu - implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA) Reviewers: brecht, dingto Reviewed By: brecht CC: dingto, venomgfx Differential Revision: https://developer.blender.org/D317	2014-03-07 23:16:33 +01:00
Martijn Berger	1d01675833	Cuda use streams and async to avoid busywaiting This switches api usage for cuda towards using more of the Async calls. Updating only once every second is sufficiently cheap that I don't think it is worth doing it less often. Reviewed By: brecht Differential Revision: https://developer.blender.org/D262	2014-03-06 20:51:46 +01:00
Brecht Van Lommel	6b1a4fc66e	Cycle CUDA: revert the `f1aeb2ccf4` and `84f958754` busywait fixes for now. It's unclear what kind of impact they have on performance at the moment, so I rather play it safe and postpone this for 2.71. Ref T38679, Ref T38712	2014-02-19 16:08:08 +01:00
Martijn Berger	f1aeb2ccf4	this is an attempted Fix: T38679 Cycles GPU Performance Regression From my testing this (what i should have done in the first place) reduces the regression a lot. Lets hope it is enough or we have to go back to busy waiting.	2014-02-17 20:11:45 +01:00
Martijn Berger	0f91f56ce3	Cycles Network rendering, remove some exception throwing, replace with saner error handling This patch adds a network_error() function more alike how other devices handle error's - it adds a check for errors on load_kernels to make sure we do not crash if rendering without a server. - it uses the non throwing variation of boost::asio::read. Reviewers: brecht Reviewed By: brecht CC: brecht Differential Revision: https://developer.blender.org/D86	2014-02-05 21:55:51 +01:00
Martijn Berger	84f9587540	Cuda use streams and async to avoid busywaiting This is my first stab at this and is based on this IRC converstation: <mib2berlin> brecht: this is meaning as reminder only, I know you have other things to do > http://openvidia.sourceforge.net/index.php/Optimization_Notes#avoiding_busy_waits <brecht> mib2berlin: thanks, bookmarked only tested on Ubuntu 14.04 / cuda 5.0 but ill do some more testing tomorrow. Also unsure about the placement and the lifetime of the stream and the event. But creating / deleting these seems to incur a non trivial cost. Reviewers: brecht Reviewed By: brecht CC: mib2berlin, dingto Differential Revision: https://developer.blender.org/D262	2014-01-28 18:40:08 +01:00
Thomas Dinges	de28a4d4b2	Cycles: Add an AVX kernel for CPU rendering. * AVX is available on Intel Sandy Bridge and newer and AMD Bulldozer and newer. * We don't use dedicated AVX intrinsics yet, but gcc auto vectorization gives a 3% performance improvement for Caminandes. Tested on an i5-3570, Linux x64. * No change for Windows yet, MSVC 2008 does not support AVX. Reviewed by: brecht Differential Revision: https://developer.blender.org/D216	2014-01-16 17:04:11 +01:00
Brecht Van Lommel	d9e52ac98b	Code cleanup: move half float functions to separate header file.	2014-01-15 15:29:22 +01:00
Thomas Dinges	5d88f7c7db	Cycles: Build SSE41 kernel per default, remove build option. This hopefully also fixes some compile errors on various systems.	2014-01-14 22:04:32 +01:00
Thomas Dinges	9351ac0d85	Cycles: Skip the compilation of the dedicated SSE2 kernel on x86-64, we can assume SSE2 here, so just re-use the regular one. Saves 500kb in the blender binary. Reviewed by: brecht Differential Revision: https://developer.blender.org/D199	2014-01-14 20:39:54 +01:00
Brecht Van Lommel	241fccaf6a	Fix T37817: cycles CUDA detection problem on Windows with non-ascii paths.	2014-01-11 00:47:58 +01:00
Thomas Dinges	ce6dce3b13	Code cleanup / Cycles: else/if for SSE41 kernel functions.	2014-01-06 03:22:14 +01:00
Thomas Dinges	ad0a3de3ce	Cycles / OpenCL: Let the OpenCL runtime determine its optimal work-group size automatically, by passing a NULL pointer here. This is recommended in the Intel OpenCL optimization docs (http://software.intel.com/en-us/vcsource/samples/optimizing-opencl) and I can confirm a small performance increase here (1-2% on nVidia OpenCL, up to 8% on Intel OpenCL).	2013-12-24 20:20:57 +01:00
Thomas Dinges	011ae78857	Cycles / OpenCL: Fix compile error on OS X After update to Mac OS X 10.9.1, OpenCL works now on my Intel CPU in the 2013 Macbook Pro (even the entire kernel). The Intel Iris Pro GPU still segfaults here though, even when all flags are disabled (building "clay like" kernel only). Maybe we need the -no-missing-prototypes for AMD hardware still, but I couldn't find a way to distuinguish here.	2013-12-17 09:59:18 +01:00
Martijn Berger	85a0c5d4e1	Cycles: network render code updated for latest changes and improved This actually works somewhat now, although viewport rendering is broken and any kind of network error or connection failure will kill Blender. * Experimental WITH_CYCLES_NETWORK cmake option * Networked Device is shown as an option next to CPU and GPU Compute * Various updates to work with the latest Cycles code * Locks and thread safety for RPC calls and tiles * Refactored pointer mapping code * Fix error in CPU brand string retrieval code This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel. Reviewers: brecht Differential Revision: http://developer.blender.org/D36	2013-12-07 12:26:58 +01:00
Martijn Berger	e3a79258d1	Cycles: test code for sse 4.1 kernel and alignment for some vector types. This is mostly work towards enabling the __KERNEL_SSE__ option to start using SIMD operations for vector math operations. This 4.1 kernel performes about 8% faster with that option but overall is still slower than without the option. WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel. Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2% speedup on tested systems with the current kernel already, so is enabled now.	2013-11-22 14:42:41 +01:00
Campbell Barton	48c1e0c0fc	spelling: use American spelling for canceled	2013-10-26 01:06:19 +00:00
Brecht Van Lommel	451607630e	Fix #37134 : cycles viewport not displaying correct with multi GPU render and graphics card that does not support CUDA OpenGL interop.	2013-10-18 20:11:07 +00:00
Brecht Van Lommel	9d7567d6ac	Fix #37002 : cycles viewport render shows white on old graphics cards with no support for non-power-of-two textures.	2013-10-12 13:55:52 +00:00
Thomas Dinges	b5a5773fa9	Cycles / CUDA: * Remove support for CUDA Toolkit 4.x, only Toolkit 5.0 and above are supported now. * Remove support for sm_1x cards (< Fermi) for good. We didn't officially support those cards for a few releases already, now remove some special code that was still there.	2013-10-08 15:29:28 +00:00
Brecht Van Lommel	cbb783f1d6	Fix cycles OpenCL compile error on AMD, and fix assert in debug builds.	2013-10-02 14:41:04 +00:00
Brecht Van Lommel	31e6181187	Fix #36873 : cycles opencl render status show negative sample count.	2013-09-30 12:11:25 +00:00
Brecht Van Lommel	fa352bb749	Fix #35684 : cycles unable to use full 6GB of memory on NVidia Titan GPU. We now use arrays instead of textures for general storage on this card (image textures are still stored as texture). Textures were found to be faster on older cards, but the limits on 1D texture size have not increased along with the memory size, which meant that the full 6 GB could not be used. The performance actually seems to be slightly better with arrays in some tests on Titan. For older cards there seems to be a bit of a mix, some are better and others not. We may change those to use arrays too, but more testing is needed, only Titan and Tesla K20 (sm_35) is changed for now. The fact that arrays are faster is a bit surprising, as others found textures to be faster on Kepler. However even if they were, the memory limitation is more important to solve anyway. https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum	2013-09-27 19:09:31 +00:00
Thomas Dinges	cb19d9fa35	Code cleanup / Cycles: * Removed unused member of the device_memory template.	2013-09-04 16:24:58 +00:00
Brecht Van Lommel	29f6616d60	Cycles: viewport render now takes scene color management settings into account, except for curves, that's still missing from the OpenColorIO GLSL shader. The pixels are stored in a half float texture, converterd from full float with native GPU instructions and SIMD on the CPU, so it should be pretty quick. Using a GLSL shader is useful for GPU render because it avoids a copy through CPU memory.	2013-08-30 23:49:38 +00:00
Brecht Van Lommel	6785874e7a	Fix #36137 : cycles render not using all GPU's when the number of GPU's is larger than the number of CPU threads	2013-08-30 23:09:22 +00:00
Brecht Van Lommel	01e22d1b9f	Cycles: more code refactoring to rename things internally as well. Also change property name back so we keep compatibility.	2013-08-23 14:34:34 +00:00
Brecht Van Lommel	b9ce231060	Cycles: relicense GNU GPL source code to Apache version 2.0. More information in this post: http://code.blender.org/ Thanks to all contributes for giving their permission!	2013-08-18 14:16:15 +00:00
Thomas Dinges	743a7a4a4b	Cycles: * GPU kernel can now be compiled without __NON_PROGRESSIVE__ again, was broken after my last commit. Also add a check for have_error(), in case the GPU kernel comes without Non-Progressive, to avoid a crash. * Don't compile progressive kernel twice on CPU, if __NON_PROGRESSIVE__ would be disabled there.	2013-08-09 20:03:49 +00:00
Thomas Dinges	a18112249d	Cycles / Non-Progressive integrator: * Non-Progressive integrator is now available on the GPU (CUDA, sm_20 and above). Implementation details: * kernel_path_trace() has been split up into two functions: kernel_path_trace_non_progressive() and kernel_path_trace_progressive(). * We compile two CUDA kernel entry functions (in kernel.cu) for the two integrators, they are still inside one .cubin file but due to the kernel separation there should be no performance problem. I tested with the BMW file on my Geforce 540M and the render times were the same for 100 samples (1.57 min in my case). This is part of my GSoC project, SVN merge of r59032 + manual merge of UI changes for this from my branch.	2013-08-09 18:47:25 +00:00
Thomas Dinges	9732c6283e	Cycles / CPU Rendering: * "Auto Detect" now again uses the umber of cores, instead number of cores + 1. This was added before we had Tile rendering and benchmarks on several systems showed that there is no gain with this now. There might be some slight difference (0.5% or so) slower/faster depending on the scene, but this is negligible.	2013-07-20 00:40:03 +00:00
Brecht Van Lommel	7902fa57b6	Code cleanup: cycles * Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit. * Remove CUDA kernel launch size exception on Mac, is not needed. * Make OSL file compilation quiet like c/cpp files.	2013-06-26 23:29:33 +00:00
Brecht Van Lommel	e11e30aadf	Fix Cycles OpenCL issue if context/program creation fails, mistake by me, patch #35866 by Doug Gale to fix it.	2013-06-26 12:24:33 +00:00
Brecht Van Lommel	2e3035dd80	Cycles OpenCL: make displacement and world importance sampling work.	2013-06-21 13:05:08 +00:00
Brecht Van Lommel	8d6e5e2fee	Cycles: update build configurations to include CUDA sm_35 architecture. When using a compiler older than CUDA 5.0 it will give a warning and skip this architecture.	2013-06-20 13:10:47 +00:00
Brecht Van Lommel	16204bd647	Cycles: prepare to make CUDA 5.0 the official version we use * Add CUDA compiler version detection to cmake/scons/runtime * Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x is used, these were workarounds for CUDA 4.2 bugs * Change max number of registers to 32 for sm 2.x (based on performance tests from Martijn Berger and confirmed here), and also for NVidia OpenCL. Overall it seems that with these changes and the latest CUDA 5.0 download, that performance is as good as or better than the 2.67b release with the scenes and graphics cards I tested.	2013-06-19 17:54:23 +00:00
Brecht Van Lommel	0ad88d1001	Fix another windows / msvc build error.	2013-06-01 02:39:34 +00:00
Brecht Van Lommel	4f056d1be7	Fix windows / msvc build error.	2013-06-01 02:28:57 +00:00
Brecht Van Lommel	2d0a586c29	Cycles OpenCL: keep the opencl context and program around for quicker rendering the second time, as for example Intel CPU startup time is 9 seconds. * Adds an cache for contexts and programs for each platform and device pair, which also ensure now no two threads try to compile and write the binary cache file at the same time. * Change clFinish to clFlush so we don't block until the result is done, instead it will block at the moment we copy back memory. * Fix error in Cycles time_sleep implementation, does not affect any active code though. * Adds some (disabled) debugging code in the task scheduler. Patch #35559 by Doug Gale.	2013-05-31 16:19:03 +00:00
Thomas Dinges	722680d7cf	Cycles / OpenCL: * Use advanced shading for nvidia as well, works fine on my Geforce 540M with sm_21. I tested the files from regression suite.	2013-05-27 17:13:36 +00:00
Brecht Van Lommel	4bdb54a76e	Cycles OpenCL: patch #35514 by Doug Gale * Support using devices from all OpenCL platforms, so that you can use e.g. both Intel and NVidia OpenCL implementations if you have them installed. * Fix compile error due to missing fmodf after recent math node change. * Enable advanced shading for Intel OpenCL. * CYCLES_OPENCL_DEBUG environment variable for generating debug symbols so you can debug with gdb. This crashes the compiler with Intel OpenCL on Linux though. To make this work the preprocessed kernel source code is written out, as gdb needs this. * Show OpenCL compiler warnings even if the build succeeded. * Some small fixes to initialize cdDevice to NULL, add missing NULL check when creating buffer and add missing space at end of build options for Apple OpenCL. * Fix crash with multi device + opencl, now e.g. CPU + GPU render should work. I did a few tweaks to the code and also: * Fix viewport render failing sometimes with Apple CPU OpenCL, was not taking workgroup size limits into account properly. * Add compile error when advanced shading in the Blender binary and OpenCL kernel are not in sync.	2013-05-27 16:21:07 +00:00
Thomas Dinges	11707119de	Cycles: * Code cleanup, remove unused "resolution" variable from the DeviceTask class, was never used.	2013-05-14 21:18:20 +00:00
Brecht Van Lommel	cd3283f573	Cycles CUDA: in case of cryptic error messages in the console, refer to wiki documentation for possible solutions.	2013-05-13 21:36:48 +00:00
Thomas Dinges	522eeaa6a0	Cycles / OpenCL: * Remove old comment for sm_13 cards and really check for OpenCL 1.1.	2013-05-09 16:16:41 +00:00
Brecht Van Lommel	d0ffbeec73	Cycles OpenCL: a few fixes to get things compiling after kernel changes, for Apple OpenCL on OS X 10.8 and simple AO render. Also environment variable CYCLES_OPENCL_TEST can now be set to CPU, GPU, ACCELERATOR, DEFAULT or ALL values to test particuler devices.	2013-05-09 14:05:40 +00:00
Brecht Van Lommel	40b05d364e	Cycles: code refactoring to add generic lookup table memory.	2013-04-01 20:26:43 +00:00
Thomas Dinges	50c28740d4	Cycles / CUDA: * Simplify Computing Capability Check, only check for major.	2013-03-17 14:32:50 +00:00
Thomas Dinges	dc90ce5b6d	Cycles GPU rendering: * Deprecate computing capability 1.3 (sm_13) This commit disables auto build of sm_13 CUDA platform, which means that starting with Blender 2.67, we don't support sm_13 devices anymore. It has become difficult to support that and it was already feature incomplete (no render-passes, AO, Multi Closure etc). It's still possible to manually enable sm_13 for own tests, but building might break in the future.	2013-02-21 17:14:07 +00:00
Thomas Dinges	a239700f43	Cycles: * Code cleanup, remove deprecated support_advanced_shading() functions. Left over from r43734.	2013-02-21 17:10:14 +00:00

1 2 3

142 Commits