When the BMesh source and result arguments are the same, restore
performance lost by 9175d9b7c2, which made copying layers
have quadratic time complexity. When we know the custom data format
is the same between the source and result, the copying can be much
simpler, so it's worth specializing this case. There is still more
to be done, because often we only know that the two meshes are the
same at runtime. A followup commit will add that check.
The quadratic runtime means performance is fine for low layer counts,
and terrible with higher layer counts. For example, in my testing with
47 boolean attributes, copying 250k vertices went from 2.3 seconds to
316 ms.
The implementation uses a new CustomData function that copies an entire
BMesh custom data block, called by a function in the BMesh module
overloaded for every BMesh element type. That handles the extra data
like flags, normals, and material indices.
Related to #115776