test2

Author	SHA1	Message	Date
Lukas Stockner	fa3d50af95	Cycles: Improve denoising speed on GPUs with small tile sizes Previously, the NLM kernels would be launched once per offset with one thread per pixel. However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown. Therefore, the kernels are now launched in a single call that handles all offsets at once. This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory. On the other hand, of course, the smaller tiles significantly reduce the size of the memory. The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum. I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere. To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now. Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.	2017-11-30 07:37:08 +01:00
Maxym Dmytrychenko	7e349f2745	Cycles: improve triangle intersection performance. Reduces render time by about 1-2% in benchmark scenes. Differential Revision: https://developer.blender.org/D2911	2017-11-29 18:11:40 +01:00
Lukas Stockner	d8066fb0f1	Cycles: Refactor closure roughness detection to fix a potential bug with Denoising of specular shaders	2017-11-14 04:17:54 +01:00
Sergey Sharybin	d1a761c4d4	Cycles: Fix compilation error of standalone application	2017-11-13 10:49:05 +01:00
Sergey Sharybin	42dff6cc2e	Cycles: Fix compilation error with OIIO compiled against system PugiXML	2017-11-13 10:42:29 +01:00
Sergey Sharybin	db7a78a2be	Cycles: Fix compilation error with latest OIIO There was some changes about namespaces, which causes ambiguities. Replaces using namespace with an explicit symbols we need. Is good idea to NOT pull in the whole namespace anyway!	2017-11-10 10:04:33 +01:00
Sergey Sharybin	46963f359d	Cycles: Bump version number to 1.9.0 This matches Blender Release 2.79.	2017-10-31 13:34:34 +01:00
Brecht Van Lommel	070a668d04	Code refactor: move more memory allocation logic into device API. * Remove tex_* and pixels_* functions, replace by mem_. Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices. * No longer create device_memory and call mem_* directly, always go through device_only_memory, device_vector and device_pixels.	2017-10-24 01:25:19 +02:00
Brecht Van Lommel	57a0cb797d	Code refactor: avoid some unnecessary device memory copying.	2017-10-21 20:58:28 +02:00
Brecht Van Lommel	f61c340bc1	Cycles: OpenCL bicubic and tricubic texture interpolation support.	2017-10-08 02:55:44 +02:00
Brecht Van Lommel	23098cda99	Code refactor: make texture code more consistent between devices. * Use common TextureInfo struct for all devices, except CUDA fermi. * Move image sampling code to kernels//kernel__image.h files. * Use arrays for data textures on Fermi too, so device_vector<Struct> works.	2017-10-07 14:53:14 +02:00
Brecht Van Lommel	4537e85584	Fix T53001: more workarounds for crash in AMD compiler with recent drivers.	2017-10-05 17:57:58 +02:00
Brecht Van Lommel	18a353dd24	Fix T52368: Cycles OSL trace() failing on Windows 32 bit.	2017-09-20 19:38:08 +02:00
Sergey Sharybin	885c0a5f90	Cycles: Fix compilation warning	2017-09-04 13:28:15 +02:00
Brecht Van Lommel	1457e5ea73	Fix Cycles Windows render errors with BVH2 CPU rendering. One problem is that it was always using __mm_blendv_ps emulation even if the instruction was supported. The other that the emulation function was wrong. Thanks a lot to Ray Molenkamp for tracking this one down.	2017-08-29 22:55:35 +02:00
Sergey Sharybin	90299e4216	Cycles: Add utility function to query current value of scoped timer	2017-08-25 14:27:34 +02:00
Sergey Sharybin	436d1b4e90	Cycles: FIx issue with -0 being considered a non-finite value	2017-08-24 14:32:56 +02:00
Mai Lavelle	2540741dee	Fix implementation of atomic update max and move to a central location While unlikely to have had any serious effects because of limited use, the previous implementation was not actually atomic due to a data race and incorrectly coded CAS loop. We also had duplicates of this code in a few places, it's now been moved to a single location with all other atomic operations.	2017-08-23 06:54:25 -04:00
Brecht Van Lommel	296d74c4b1	Cycles: reorganize Performance panel layout, move viewport BVH type to debug.	2017-08-21 19:05:17 +02:00
Brecht Van Lommel	4d428d14af	Fix T52443: Cycles OpenCL build error after recent mesh lights changes.	2017-08-19 01:02:55 +02:00
Brecht Van Lommel	6919393a51	Fix T52372: CUDA build error after recent changes.	2017-08-12 20:37:06 +02:00
Brecht Van Lommel	d7639d57dc	Fix T52368: OSL trace() crash after recent changes.	2017-08-12 14:32:52 +02:00
Brecht Van Lommel	267e75158a	Fix T52322: denoiser broken on Windows after recent changes. It's not clear why this only happened on Windows, but the code was wrong and should do a bitcast here instead of conversion.	2017-08-11 01:09:35 +02:00
Sergey Sharybin	fd397a7d28	Cycles: Add utility macro ccl_ref It is defined to & for CPU side compilation, and defined to an empty for any GPU platform. The idea here is to use this macro instead of #ifdef block with bunch of duplicated lines just to make it so CPU code is efficient. Eventually we might switch to references on CUDA as well, but that would require some intensive testing.	2017-08-08 15:27:25 +02:00
Brecht Van Lommel	dc4d850d10	Fix Windows build errors with recent Cycles SIMD refactoring.	2017-08-07 17:54:26 +02:00
Sergey Sharybin	580741b317	Cycles: Cleanup, space after keyword	2017-08-07 14:47:51 +02:00
Brecht Van Lommel	ee77c1e917	Code refactor: use float4 instead of intrinsics for CPU denoise filtering. Differential Revision: https://developer.blender.org/D2764	2017-08-07 14:01:24 +02:00
Brecht Van Lommel	a24fbf3323	Code refactor: add, remove, optimize various SSE functions. * Remove some unnecessary SSE emulation defines. * Use full precision float division so we can enable it. * Add sqrt(), sqr(), fabs(), shuffle variations, mask(). * Optimize reduce_add(), select(). Differential Revision: https://developer.blender.org/D2764	2017-08-07 14:01:24 +02:00
Brecht Van Lommel	a8cc0d707e	Code refactor: split defines into separate header, changes to SSE type headers. I need to use some macros defined in util_simd.h for float3/float4, to emulate SSE4 instructions on SSE2. But due to issues with order of header includes this was not possible, this does some refactoring to make it work. Differential Revision: https://developer.blender.org/D2764	2017-08-07 14:01:24 +02:00
Sergey Sharybin	0d01cf4488	Cycles: Extra tweaks to performance of header expansion Two main things here: 1. Replace all unsafe for #line directive characters into a single loop, avoiding multiple iterations and multiple temporary strings created. 2. Don't merge token char by char but calculate start and end point and then copy all substring at once. This gives about 15% speedup of source processing time. At this point (with all previous commits from today) we've shrinked down compiled sources size from 108 MB down to ~5.5 MB and lowered processing time from 4.5 sec down to 0.047 sec on my laptop running Linux (this was a constant time which Blender will always spent first time loading kernel, even if we've got compiled clbin).	2017-08-03 08:07:06 +02:00
Sergey Sharybin	f879cac032	Cycles: Avoid some expensive operations in header expansions Basically gather lines as-is during traversal, avoiding allocating memory for all the lines in headers. Brings additional performance improvement abut 20%.	2017-08-02 20:59:19 +02:00
Sergey Sharybin	a280697e77	Cycles: Support "precompiled" headers in include expansion algorithm The idea here is that it is possible to mark certain include statements as "precompiled" which means all subsequent includes of that file will be replaced with an empty string. This is a way to deal with tricky include pattern happening in single program OpenCL split kernel which was including bunch of headers about 10 times. This brings preprocessing time from ~1sec to ~0.1sec on my laptop.	2017-08-02 20:59:19 +02:00
Sergey Sharybin	4ad39964fd	Cycles: Speed up #include expansion algorithm The idea is to re-use files which were already processed. Gives about 4x speedup of processing time (~4.5sec vs 1.0sec) on my laptop for the whole OpenCL kernel. For users it will mean lower delay before OpenCL rendering might start.	2017-08-02 20:59:19 +02:00
Jeff Knox	e93804318f	Fix T51450: viewport render time keeps increasing after render is done. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2747	2017-07-25 01:47:04 +02:00
Brecht Van Lommel	db8bc1d982	Fix a few harmless maybe uninitialized warnings with GCC 5.4. GCC seems to detect uninitialized into function calls now, but then isn't always smart enough to see that it is actually initialized. Disabling this warning entirely seems a bit too much, so initialize a bit more now.	2017-07-21 00:54:58 +02:00
Mai Lavelle	9c3f1ad003	Cycles: Add artificial memory limit debug option for OpenCL	2017-07-06 05:25:46 -04:00
Mai Lavelle	95b345b2fe	Revert "Cycles: use std::min and max for extra overloads" We already have this in util_algorithm.h This reverts commit `cff172c762`.	2017-07-06 04:21:29 -04:00
Mai Lavelle	cff172c762	Cycles: use std::min and max for extra overloads	2017-07-05 19:43:34 -04:00
Sergey Sharybin	31f8ca5034	Cycles: Fix compilation error after recent logging changes This file uses std::ostream for helper << operators, so need to make sure corresponding header is included.	2017-07-05 20:40:55 +02:00
Sergey Sharybin	58c456b12d	Cycles: Fix compilation error when building without Glog and no C++11	2017-07-05 12:01:12 +02:00
Mai Lavelle	c8fa716c06	Cycles: Use float constants instead of double	2017-06-29 23:07:18 -04:00
Sergey Sharybin	794311c92b	Cycles: Fix race condition happening in progress utility This is not enough to mutex-guard modification code of integer values, since this operation is NOT atomic. This is not even safe for a single byte data types. For now guarded the getter functions, similar to other functions in this module. Ideally we want to switch modification to an atomic operations, so we wouldn't need any locks in the getters.	2017-06-16 10:22:35 +02:00
Mai Lavelle	4360e8ce13	Cycles: Add atomic decrement functions to util_atomic.h	2017-06-10 03:51:18 -04:00
Sergey Sharybin	6a546fc73e	Cycles: Don't leave multiple spaces in the device name	2017-06-08 12:15:24 +02:00
Sergey Sharybin	55c15ad9de	Cycles: Use falltrhough attribute to help catching missing break statements	2017-05-24 17:23:54 +02:00
Sergey Sharybin	38a2bf665b	Cycles: Cleanup, style and unused arguments - Some arguments were inapproriatry tagged as unused using (void)foo semantic. Only use such semantic in tricky casses, when something needs to be ignored in release builds or something is dependent on tricky ifndef policy. For rest of the cases just use void foo(int /bar*/) semantic, which ensures variable is not used. Solves confusion and code running out of sync with later development. - Used proper unused semantic to some arguments. - Added braces to make code easier to follow, tricky indentation with ifdef, uh.	2017-05-20 05:21:27 -07:00
Lukas Stockner	3dee1f079f	Fix T51560: Black pixels on a denoising render Once again, numerical instabilities causing the Cholesky decomposition to fail. However, further increasing the diagonal correction just because of a few pixels in very specific scenes and settings seems unjustified. Therefore, this commit simply falls back to the basic NLM-filtered pixel if the more advanced model fails.	2017-05-19 23:31:49 +02:00
Sergey Sharybin	ef549b9e55	Cycles: Cleanup, always use parenthesis Easier to read/follow, and more robust for the further changes.	2017-05-19 12:57:51 +02:00
Sergey Sharybin	908bb8bd82	Cycles: Cleanup, indentation in preprocessor	2017-05-19 12:54:46 +02:00
Sergey Sharybin	803337f3f6	\0;115;0cCycles: Cleanup, use ccl_restrict instead of ccl_restrict_ptr There were following issues with ccl_restrict_ptr: - We already had ccl_restrict for all platforms. - It was secretly adding `const` qualifier to the declaration, which is quite weird since non-const pointer can also be declared as restricted. - We never in Blender are using foo_ptr or FooPtr type definitions, so not sure why we should introduce such a thing here. - It is absolutely wrong from semantic point of view to put pointer into the restrict macro -- const is a part of type, not part of hint for compiler that some pointer is never aliased.	2017-05-19 12:41:03 +02:00

1 2 3 4 5 ...

623 Commits