Cycles: Apple Silicon optimization to specialize intersection kernels

The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.

The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.

M1 Max samples per minute (macOS 13.0):

                    PSO_GENERIC  PSO_SPECIALIZED_INTERSECT  PSO_SPECIALIZED_SHADE

barbershop_interior       83.4	            89.5                   93.7
bmw27                   1486.1	          1671.0                 1825.8
classroom                175.2	           196.8                  206.3
fishy_cat                674.2	           704.3                  719.3
junkshop                 205.4	           212.0                  257.7
koro                     310.1	           336.1                  342.8
monster                  376.7	           418.6                  424.1
pabellon                 273.5	           325.4                  339.8
sponza                   830.6	           929.6                 1142.4
victor                    86.7              96.4                   96.3
wdas_cloud               111.8	           112.7                  183.1

Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones

Differential Revision: https://developer.blender.org/D14645
This commit is contained in:
Michael Jones
2022-07-12 15:32:46 +02:00
committed by Brecht Van Lommel
parent 5653c5fcdd
commit da4ef05e4d
13 changed files with 401 additions and 129 deletions

View File

@@ -136,6 +136,19 @@ void string_replace(string &haystack, const string &needle, const string &other)
}
}
void string_replace_same_length(string &haystack, const string &needle, const string &other)
{
assert(needle.size() == other.size());
size_t pos = 0;
while (pos != string::npos) {
pos = haystack.find(needle, pos);
if (pos != string::npos) {
memcpy(haystack.data() + pos, other.data(), other.size());
pos += other.size();
}
}
}
string string_remove_trademark(const string &s)
{
string result = s;
@@ -164,6 +177,11 @@ string to_string(const char *str)
return string(str);
}
string to_string(const float4 &v)
{
return string_printf("%f,%f,%f,%f", v.x, v.y, v.z, v.w);
}
string string_to_lower(const string &s)
{
string r = s;