test2

Author	SHA1	Message	Date
Lukas Stockner	2069102c56	Cycles: Fix constness for load_kernels in device_cpu.cpp	2017-12-06 00:00:18 +01:00
Lukas Stockner	fa3d50af95	Cycles: Improve denoising speed on GPUs with small tile sizes Previously, the NLM kernels would be launched once per offset with one thread per pixel. However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown. Therefore, the kernels are now launched in a single call that handles all offsets at once. This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory. On the other hand, of course, the smaller tiles significantly reduce the size of the memory. The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum. I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere. To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now. Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.	2017-11-30 07:37:08 +01:00
Lukas Stockner	40f528a7da	Cycles: Add per-tile render time debug pass Reviewers: sergey, brecht Differential Revision: https://developer.blender.org/D2920	2017-11-17 16:40:24 +01:00
Brecht Van Lommel	e568c1a975	Fix T53289: CUDA missing textures not showing pink, after recent changes.	2017-11-12 20:45:47 +01:00
Mai Lavelle	e389ae9dca	Cycles: Set error if a split kernel fails to load To help catch cases where adding a new kernel is missed for one of the device implementations.	2017-11-11 01:01:14 -05:00
Brecht Van Lommel	bd4bea3e98	Cycles: avoid reallocating tile denoising memory many times during render.	2017-11-09 20:28:00 +01:00
Mai Lavelle	087331c495	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable Goal is to reduce OpenCL kernel recompilations. Currently viewport renders are still set to use 64 closures as this seems to be faster and we don't want to cause a performance regression there. Needs to be investigated. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2775	2017-11-09 01:04:06 -05:00
Brecht Van Lommel	f79f386731	Code refactor: rename subsurface to local traversal, for reuse.	2017-11-07 22:35:12 +01:00
Brecht Van Lommel	ff34e48911	Cycles: add an extra CUDA synchronize before rendering. It should not be needed as far as I know, but just in case it fixes any of the recent issues like T52572.	2017-11-07 22:35:12 +01:00
Brecht Van Lommel	5801ef71e4	Code refactor: device memory cleanups, preparing for mapped host memory.	2017-11-05 15:22:04 +01:00
Brecht Van Lommel	5475314f49	Cycles: reserve CUDA local memory ahead of time. This way we can log the amount of memory used, and it will be important for host mapped memory support.	2017-11-05 15:22:04 +01:00
Brecht Van Lommel	33b5e8daff	Code refactor: replace CUDA array with linear memory for 1D and 2D textures. This is a prequisite for getting host memory allocation to work. There appears to be no support for 3D textures using host memory. The original version of this code was written by Stefan Werner for D2056.	2017-11-04 02:23:00 +01:00
Brecht Van Lommel	6ec599c682	Fix T53247: mixed CPU + GPU render wrong texture limits.	2017-11-03 20:32:29 +01:00
Mai Lavelle	5cb8730689	Cycles: Add another limit to OpenCL memory usage Some drivers may report very large allocation sizes, which could cause unnecessary memory usage. This is now limited to 2gb which should still be enough to get the needed performance benefits without waste.	2017-11-02 08:14:21 -04:00
Brecht Van Lommel	83877632a3	Fix one more assert being triggered due to recent changes.	2017-10-25 01:22:16 +02:00
Brecht Van Lommel	34fe3f9c06	Code refactor: remove MEM_WRITE_ONLY, always use MEM_READ_WRITE. It's unlikely the driver can do useful optimizations with this, and if we sum multiple samples we are reading from the memory anyway.	2017-10-24 23:53:09 +02:00
Brecht Van Lommel	ec49503a33	Fix T53146: incomplete multi GPU and CPU + GPU memory statistics. Part due to recent changes, part old bug.	2017-10-24 17:40:43 +02:00
Sergey Sharybin	e03df90bf3	Cycles: Fix compilation in debug mode Please check compilation before committing refactor changes!	2017-10-24 12:09:02 +02:00
Sergey Sharybin	eccd18a91f	Cycles: Fix compilation error without C++11	2017-10-24 11:14:01 +02:00
Brecht Van Lommel	a1aad1f8d1	Fix T53134: denoising with CPU + GPU render leaves some tiles noisy.	2017-10-24 04:09:48 +02:00
Brecht Van Lommel	070a668d04	Code refactor: move more memory allocation logic into device API. * Remove tex_* and pixels_* functions, replace by mem_. Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices. * No longer create device_memory and call mem_* directly, always go through device_only_memory, device_vector and device_pixels.	2017-10-24 01:25:19 +02:00
Brecht Van Lommel	aa8b4c5d81	Code refactor: use device_only_memory and device_vector in more places.	2017-10-24 01:25:13 +02:00
Brecht Van Lommel	7ad9333fad	Code refactor: store device/interp/extension/type in each device_memory.	2017-10-24 01:03:59 +02:00
Brecht Van Lommel	ae41f38f78	Code refactor: pass device to scene, check OSL with device info.	2017-10-24 01:03:59 +02:00
Brecht Van Lommel	57a0cb797d	Code refactor: avoid some unnecessary device memory copying.	2017-10-21 20:58:28 +02:00
Brecht Van Lommel	dc9eb8234f	Cycles: combined CPU + GPU rendering support. CPU rendering will be restricted to a BVH2, which is not ideal for raytracing performance but can be shared with the GPU. Decoupled volume shading will be disabled to match GPU volume sampling. The number of CPU rendering threads is reduced to leave one core dedicated to each GPU. Viewport rendering will also only use GPU rendering still. So along with the BVH2 usage, perfect scaling should not be expected. Go to User Preferences > System to enable the CPU to render alongside the GPU. Differential Revision: https://developer.blender.org/D2873	2017-10-21 20:13:44 +02:00
Sergey Sharybin	910dd7fb1b	Cycles: Add extra logging in CUDA device detection code	2017-10-19 11:26:10 +02:00
Brecht Van Lommel	92611dada6	Fix T53098, T53079: OpenCL world texture errors after recent changes.	2017-10-18 03:13:25 +02:00
Sergey Sharybin	4782000fd5	Cycles: Fix possible race condition when initializing devices list	2017-10-11 12:48:19 +05:00
Brecht Van Lommel	e360d003ea	Cycles: schedule more work for non-display and compute preemption CUDA cards. This change affects CUDA GPUs not connected to a display or connected to a display but supporting compute preemption so that the display does not freeze. I couldn't find an official list, but compute preemption seems to be only supported with GTX 1070+ and Linux (not GTX 1060- or Windows). This helps improve small tile rendering performance further if there are sufficient samples x number of pixels in a single tile to keep the GPU busy.	2017-10-08 21:12:16 +02:00
Mathieu Menuet	5aa08eb3cc	Fix T53017: Cycles not detecting AMD GPU when there is an NVidia GPU too. Best guess is that cuInit() somehow interferes with the AMD graphics driver on Windows, and switching the initialization order to do OpenCL first seems to solve the issue.	2017-10-08 18:36:02 +02:00
Brecht Van Lommel	cdb0b3b1dc	Code refactor: use DeviceInfo to enable QBVH and decoupled volume shading.	2017-10-08 13:17:33 +02:00
Brecht Van Lommel	23098cda99	Code refactor: make texture code more consistent between devices. * Use common TextureInfo struct for all devices, except CUDA fermi. * Move image sampling code to kernels//kernel__image.h files. * Use arrays for data textures on Fermi too, so device_vector<Struct> works.	2017-10-07 14:53:14 +02:00
Brecht Van Lommel	fb99ea79f8	Code refactor: split displace/background into separate kernels, remove luma.	2017-10-05 17:57:58 +02:00
Brecht Van Lommel	49199963bf	Fix incorrect CUDA remaining time estimate after previous commit.	2017-10-04 23:25:51 +02:00
Brecht Van Lommel	6da6f8d33f	Cycles: CUDA faster rendering of small tiles, using multiple samples like OpenCL. The work size is still very conservative, and this doesn't help for progressive refine. For that we will need to render multiple tiles at the same time. But this should already help for denoising renders that require too much memory with big tiles, and just generally soften the performance dropoff with small tiles. Differential Revision: https://developer.blender.org/D2856	2017-10-04 21:58:47 +02:00
Brecht Van Lommel	12f4538205	Code refactor: use split variance calculation for mega kernels too. There is no significant difference in denoised benchmark scenes and denoising ctests, so might as well make it all consistent.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	e3e16cecc4	Code refactor: remove rng_state buffer and compute hash on the fly. A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	5b7d6ea54b	Code refactor: add WorkTile struct for passing work to kernel. This makes sharing some code between mega/split in following commits a bit easier, and also paves the way for rendering multiple tiles later.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	88520dd5b6	Code refactor: simplify CUDA context push/pop. Makes it possible to call a function like mem_alloc() when the context is already active. Also fixes some missing pops in case of errors.	2017-09-27 13:43:21 +02:00
Mai Lavelle	124ffb45a6	Cycles: Fix build with networking enabled	2017-08-30 00:19:44 -04:00
Sergey Sharybin	12d527f327	Cycles: Correct logging of sued CPU intrisics	2017-08-25 14:27:34 +02:00
Brecht Van Lommel	43a6cf1504	Cycles: attempt to recover from crashing CUDA/OpenCL drivers on Windows. I don't know if this will actually work, needs testing. Ref T52064.	2017-08-20 23:18:25 +02:00
Brecht Van Lommel	85ad248c36	Code cleanup: fix warning and improve terminology.	2017-08-12 13:18:05 +02:00
Sergey Sharybin	176ad9ecdd	Cycles: Remove ulong usage This is a bit confusing, especially when one mixes OpenCL code where ulong equals to uint64_t with CPU side code where ulong is expected to be something else from the naming. This commit makes it so we use explicit name, common on all platforms.	2017-08-09 14:08:58 +02:00
Mai Lavelle	55d28e604e	Cycles: Proper fix for recent OpenCL image crash Problem was that some code checks to see if device_pointer is null or not and the new allocator wasn't even setting the pointer to anything as it tracks memory location separately. Setting the pointer to non null keeps all users of device_pointer happy.	2017-08-09 04:27:39 -04:00
Sergey Sharybin	99c13519a1	Cycles: More fixes for Windows 32 bit - Apparently MSVC does not support compound literals in C++ (at least by the looks of it). - Not sure how opencl_device_assert was managing to set protected property of the Device class.	2017-08-08 22:32:51 +02:00
Sergey Sharybin	0e57282999	Cycles: Fix compilation error without C++11 Common folks, nobody considered master a C++11 only branch. Such decision is to be done officially and will involve changes in quite a few infrastructure related areas.	2017-08-08 17:02:26 +02:00
Mai Lavelle	ec8ae4d5e9	Cycles: Pack kernel textures into buffers for OpenCL Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745	2017-08-08 07:12:04 -04:00
Sergey Sharybin	580741b317	Cycles: Cleanup, space after keyword	2017-08-07 14:47:51 +02:00

1 2 3 4 5 ...

509 Commits