test2

Author	SHA1	Message	Date
Brecht Van Lommel	2d8c59ccb9	Fix T77095: fix Cycles performance regression with AMD RX cards Apply the workaround only for known problematic drivers. The latest pro driver appears to work correctly, hopefully the regular driver will as well once it is updated to the same OpenCL driver version (3075.13).	2020-06-30 12:01:40 +02:00
Brecht Van Lommel	fb68a30af6	Fix crash compiling Cycles OpenCL, after recent TBB changes	2020-06-26 17:44:24 +02:00
Campbell Barton	fd5c185beb	Cleanup: spelling	2020-06-25 23:14:36 +10:00
Brecht Van Lommel	b4e1571d0b	Cleanup: compiler warnings	2020-06-24 17:25:44 +02:00
Brecht Van Lommel	669befdfbe	Cycles: add Intel OpenImageDenoise support for viewport denoising Compared to Optix denoise, this is usually slower since there is no GPU acceleration. Some optimizations may still be possible, in avoid copies to the GPU and/or denoising less often. The main thing is that this adds viewport denoising support for computers without an NVIDIA GPU (as long as the CPU supports SSE 4.1, which is nearly all of them). Ref T76259	2020-06-24 15:17:36 +02:00
Brecht Van Lommel	0a3bde6300	Cycles: add denoising settings to the render properties Enabling render and viewport denoising is now both done from the render properties. View layers still can individually be enabled/disabled for denoising and have their own denoising parameters. Note that the denoising engine also affects how denoising data passes are output even if no denoising happens on the render itself, to make the passes compatible with the engine. This includes internal refactoring for how denoising parameters are passed along, trying to avoid code duplication and unclear naming. Ref T76259	2020-06-24 15:17:36 +02:00
Brecht Van Lommel	207338bb58	Cycles: port curve-ray intersection from Embree for use in Cycles GPU This keeps render results compatible for combined CPU + GPU rendering. Peformance and quality primitives is quite different than before. There are now two options: * Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for fast rendering. Hair curves are subdivided with a fixed number of user specified subdivisions. This gives relatively good results, especially when used with the Principled Hair BSDF and hair viewed from a typical distance. There are artifacts when viewed closed up, though this was also the case with all previous primitives (but different ones). * 3D Curve: render hair as 3D curve, for accurate results when viewing hair close up. This automatically subdivides the curve until it is smooth. This gives higher quality than any of the previous primitives, but does come at a performance cost and is somewhat slower than our previous Thick curves. The main problem here is performance. For CPU and OpenCL rendering performance seems usually quite close or better for similar quality results. However for CUDA and Optix, performance of 3D curve intersection is problematic, with e.g. 1.45x longer render time in Koro (though there is no equivalent quality and rounded ribbons seem fine for that scene). Any help or ideas to optimize this are welcome. Ref T73778 Depends on D8012 Maniphest Tasks: T73778 Differential Revision: https://developer.blender.org/D8013	2020-06-22 13:28:01 +02:00
Brecht Van Lommel	d1ef5146d7	Cycles: remove SIMD BVH optimizations, to be replaced by Embree Ref T73778 Depends on D8011 Maniphest Tasks: T73778 Differential Revision: https://developer.blender.org/D8012	2020-06-22 13:28:01 +02:00
Brecht Van Lommel	e50f1ddc65	Cycles: use TBB for task pools and task scheduler No significant performance improvement is expected, but it means we have a single thread pool throughout Blender. And it should make adding more parallellization in the future easier. After previous refactoring commits this is basically a drop-in replacement. One difference is that the task pool had a mechanism for scheduling tasks to the front of the queue to minimize memory usage. TBB has a smarter algorithm to balance depth-first and breadth-first scheduling of tasks and we assume that removes the need to manually provide hints to the scheduler. Fixes T77533	2020-06-22 13:27:37 +02:00
Brecht Van Lommel	54e3487c9e	Cleanup: remove task pool stop() and finished()	2020-06-22 13:06:47 +02:00
Brecht Van Lommel	b10b7cdb43	Cleanup: use lambdas instead of functors for task pools, remove threadid	2020-06-22 13:06:47 +02:00
Brecht Van Lommel	ace3268482	Cleanup: minor refactoring around DeviceTask	2020-06-22 13:06:47 +02:00
Brecht Van Lommel	6899cb3c07	Fix for T77095: work around render artifacts with AMD Radeon RX 4xx and 5xx	2020-06-18 14:41:51 +02:00
Brecht Van Lommel	fc7c34e380	Cleanup: fix compiler warnings	2020-06-17 14:36:51 +02:00
Patrick Mours	b586f801fc	Cycles: Improve CUDA and OptiX error reporting in the viewport This patch makes the infamous "Cancel" error in the viewport a thing of the past. Instead it now shows a more useful error message and streamlines the error handling process in CUDA. Reviewed By: brecht Differential Revision: https://developer.blender.org/D8008	2020-06-12 18:24:15 +02:00
Brecht Van Lommel	faf5f7b63d	Cleanup: fix compiler warning after recent changes It would be good to use override for all member functions, but doing it for only somes generates compiler warning.	2020-06-10 20:34:01 +02:00
Patrick Mours	f367f1e5a5	Cycles: Improve OptiX viewport denoising performance with CUDA rendering With this patch Cycles recognizing when a logical OptiX and CUDA device represent the same physical GPU and attempts to eliminate unnecessary tile copies for viewport rendering if that is the case for all active devices. In addition, denoising is now no longer performed on the first available OptiX device only, but instead it will try to match CUDA and OptiX rendering/denoising devices exactly to maximize utilization. Reviewed By: brecht Differential Revision: https://developer.blender.org/D7975	2020-06-10 14:12:13 +02:00
Patrick Mours	9f7d84b656	Cycles: Add support for P2P memory distribution (e.g. via NVLink) This change modifies the multi-device implementation to support memory distribution across devices, to reduce the overall memory footprint of large scenes and allow scenes to fit entirely into combined GPU memory that previously had to fall back to host memory. Reviewed By: brecht Differential Revision: https://developer.blender.org/D7426	2020-06-08 17:55:49 +02:00
Patrick Mours	473aaa389c	Cycles: Enable OptiX on all Maxwell+ GPUs	2020-06-05 12:33:00 +02:00
Patrick Mours	49c295813b	Merge branch 'blender-v2.83-release'	2020-05-27 15:31:03 +02:00
Patrick Mours	28d9368538	Fix T76947: Optix realtime denoiser progressively reduces brightness of very bright objects The input data to the OptiX denoiser was clamped to 0..10000 as required, but it could easily exceed that range with a high number of samples (since the data contains the overall sum). To fix that, divide by the number of samples first and multiply it back in after the denoiser ran.	2020-05-27 15:17:47 +02:00
Brecht Van Lommel	d9773edaa3	Cycles: code refactor to bake using regular render session and tiles There should be no user visible change from this, except that tile size now affects performance. The goal here is to simplify bake denoising in D3099, letting it reuse more denoising tiles and pass code. A lot of code is now shared with regular rendering, with the two main differences being that we read some render result passes from the bake API when starting to render a tile, and call the bake kernel instead of the path trace kernel. With this kind of design where Cycles asks for tiles from the bake API, it should eventually be easier to reduce memory usage, show tiles as they are baked, or bake multiple passes at once, though there's still quite some work needed for that. Reviewers: #cycles Subscribers: monio, wmatyjewicz, lukasstockner97, michaelknubben Differential Revision: https://developer.blender.org/D3108	2020-05-15 20:25:24 +02:00
Brecht Van Lommel	97f50c71b9	Fix --debug-cycles printing CUDA devices twice Reuse the CUDA devices list for Optix device detection.	2020-05-14 16:07:22 +02:00
Brecht Van Lommel	d97c83712c	Cycles: mark CUDA 10.2 as officially supported It appears to work fine after a recent bugfix and testing for the past few weeks.	2020-05-05 15:06:49 +02:00
Ray Molenkamp	aeb42cf8ab	Cycles/Optix: Support building the optix kernels on demand. CMake: `WITH_CYCLES_DEVICE_OPTIX` did not respect `WITH_CYCLES_CUDA_BINARIES` causing the optix kernel to be always build at build time. Code: `device_optix.cpp` did not count on the optix kernel not existing in the default location. For this to work, one should have before starting blender 1) working nvcc environment 2) Optix SDK installed and the OPTIX_ROOT_DIR environment variable pointing to it which is not set by default Differential Revision: https://developer.blender.org/D7400 Reviewed By: Brecht	2020-04-11 12:59:21 -06:00
Brecht Van Lommel	53981c7fb6	Cleanup: refactor adaptive sampling to more easily change some parameters No functional changes yet, this is work towards making CPU and GPU results match more closely.	2020-04-07 20:29:48 +02:00
Ray Molenkamp	58ea0d93f1	Cycles/Optix: Add CYCLES_OPTIX_TEST override This works similarly to the CYCLES_OPENCL_TEST environment variable to allow testing on unsupported hardware. Note: like the OPENCL test override, this is for testing only and bug reports on unsupported hardware will not be accepted at this point in time.	2020-03-26 11:30:17 -06:00
Brecht Van Lommel	f48d15a861	Cycles: limit number of processes compiling OpenCL kernel based on memory The numbers here can probably be tweaked to be better, but it's hard to predict and this should at least avoid excessive memory swapping. Fixes T57064.	2020-03-25 16:39:37 +01:00
Brecht Van Lommel	394a1373a0	Cycles: use OpenCL C 2.0 if available, to improve performance for AMD Tested with AMD Radeon Pro WX 9100, where it brings performance back to 2.80 level, and combined with recent changes is about 2-15% faster than 2.80 in our benchmark scenes. This somehow appears to specifically address the issue where adding more shader nodes leads to slower runtime. I found no additional speedup by applying this to change to 2.80 or removing the new shader node code. Ref T71479 Patch by Jeroen Bakker. Differential Revision: https://developer.blender.org/D6252	2020-03-24 20:09:36 +01:00
Ray Molenkamp	44c6b6615b	OpenCL: Bring back CYCLES_OPENCL_TEST override Back in 2.79 you could either use the debug panel or an environment variable to override using OpenCL for unsupported hardware. Which was rather useful for developers when testing on NVidia just to be sure the CL kernels at-least build properly. This broke in rB949ab753bb2 This diff restores testing though the CYCLES_OPENCL_TEST environment variable. Differential Revision: https://developer.blender.org/D7202 Reviewers: brecht	2020-03-21 11:55:45 -06:00
Dalai Felinto	2d1cce8331	Cleanup: `make format` after SortedIncludes change	2020-03-19 09:33:58 +01:00
Brecht Van Lommel	472534d16e	Fix memory leak in recent Cycles image texture refactor	2020-03-12 20:30:49 +01:00
Brecht Van Lommel	26bea849cf	Cleanup: add device_texture for images, distinct from other global memory There was too much image texture specific stuff in device_memory, and too much code duplication between devices.	2020-03-12 17:28:55 +01:00
Brecht Van Lommel	21821601f2	Fix Optix build error on Linux with some compilers	2020-03-11 20:35:38 +01:00
Brecht Van Lommel	f01bc597a8	Cleanup: stop encoding image data type in slot index This is legacy code from when we had a fixed number of textures.	2020-03-11 17:07:17 +01:00
Brecht Van Lommel	dcdcc23488	Fix T74504: Cycles wrong progress bar with CPU adaptive sampling	2020-03-06 23:46:58 +01:00
Brecht Van Lommel	b31b44c223	Fix error in Cycles Optix adaptive sampling after recent cleanup	2020-03-06 23:46:58 +01:00
Dalai Felinto	c60366c01f	Cycles: cleanup warning	2020-03-06 15:27:50 +01:00
Brecht Van Lommel	c8ac760c59	Cleanup: tweak Cycles #includes in preparation for clang-format sorting	2020-03-06 14:44:42 +01:00
Campbell Barton	8574d68aa0	Cleanup: spelling	2020-03-06 11:52:32 +11:00
Patrick Mours	88db9a17ce	Fix T74393: Cycles crashes when both OSL and Optix Denoising are enabled Enabling viewport denoising causes Cycles to use a multi-device, which always returned NULL when asked for OSL memory and would subsequently crash. This fixes that by returning the correct OSL memory pointer from the CPU device in the special viewport denoising multi-device.	2020-03-05 16:28:31 +01:00
Stefan Werner	51e898324d	Adaptive Sampling for Cycles. This feature takes some inspiration from "RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and "A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination" The basic principle is as follows: While samples are being added to a pixel, the adaptive sampler writes half of the samples to a separate buffer. This gives it two separate estimates of the same pixel, and by comparing their difference it estimates convergence. Once convergence drops below a given threshold, the pixel is considered done. When a pixel has not converged yet and needs more samples than the minimum, its immediate neighbors are also set to take more samples. This is done in order to more reliably detect sharp features such as caustics. A 3x3 box filter that is run periodically over the tile buffer is used for that purpose. After a tile has finished rendering, the values of all passes are scaled as if they were rendered with the full number of samples. This way, any code operating on these buffers, for example the denoiser, does not need to be changed for per-pixel sample counts. Reviewed By: brecht, #cycles Differential Revision: https://developer.blender.org/D4686	2020-03-05 12:21:38 +01:00
Patrick Mours	af54bbd61c	Cycles: Rework tile scheduling for denoising This fixes denoising being delayed until after all rendering has finished. Instead, tile-based denoising is now part of the "RENDER" task again, so that it is all in one task and does not cause issues with dedicated task pools where tasks are serialized. Reviewed By: brecht Differential Revision: https://developer.blender.org/D6940	2020-02-28 16:12:29 +01:00
Patrick Mours	0cea9353fd	Fix CUDA out of memory error with OptiX viewport denoising on small GPUs This makes the memory allocation for the denoiser state use the memory allocator in Cycles, which will evict textures to host memory when there is not enough space on the device. This means the allocation for the denoiser state won't just fail if there is no more space and instead more space is made for it to work. Also simplifies code somewhat.	2020-02-28 15:58:17 +01:00
Patrick Mours	a93043153a	Cleanup: Remove superfluous "cuda_device_ptr" function	2020-02-25 17:13:59 +01:00
Dalai Felinto	213b4f76ee	Cleanup: `make format`	2020-02-19 18:44:22 +01:00
Patrick Mours	2278aa0da9	Cycles: Add support for adaptive kernel compilation to OptiX device This modifies the common CUDA implementation for adaptive kernel compilation slightly to support both CUBIN and PTX output (the latter which is then used in the OptiX device). It also fixes adaptive kernel compilation on Windows. Reviewed By: brecht Differential Revision: https://developer.blender.org/D6851	2020-02-17 14:27:44 +01:00
Ray molenkamp	9339dc6dd1	Fix T70685: Cycles crash using WITH_CYCLES_NATIVE_ONLY on Windows MSVC does not have -march=native, so the kernel gets built without AVX2 and BVH8 support. The code assumed it to be available and crashed Differential Revision: https://developer.blender.org/D6082	2020-02-14 13:55:11 +01:00
Patrick Mours	0d750d7c06	Fix OptiX denoising when multiple CUDA streams are active	2020-02-13 15:22:26 +01:00
Patrick Mours	63bde1063f	Cleanup: Remove some unnecessary OptiX device code	2020-02-13 15:22:26 +01:00

1 2 3 4 5 ...

817 Commits