Files
test2/source/blender/blenlib/BLI_math_half.hh
Aras Pranckevicius c6f5c89669 BLI: faster float<->half array conversions, use in Vulkan
In addition to float<->half functions to convert one number (#127708), add
float_to_half_array and half_to_float_array functions:
- On x64, this uses SSE2 4-wide implementation to do the conversion
  (2x faster half->float, 4x faster float->half compared to scalar),
  - There's also an AVX2 codepath that uses CPU hardware F16C instructions
    (8-wide), to be used when/if blender codebase will start to be built
    for AVX2 (today it is not yet).
- On arm64, this uses NEON VCVT instructions to do the conversion.

Use these functions in Vulkan buffer/texture conversion code. Time taken to
convert float->half texture while viewing EXR file in image space (22M
numbers to convert): 39.7ms -> 10.1ms (would be 6.9ms if building for AVX2)

Pull Request: https://projects.blender.org/blender/blender/pulls/127838
2024-09-22 17:39:54 +02:00

46 lines
1.1 KiB
C++

/* SPDX-FileCopyrightText: 2024 Blender Authors
*
* SPDX-License-Identifier: GPL-2.0-or-later */
#pragma once
/** \file
* \ingroup bli
*/
#include <cstddef>
#include <cstdint>
namespace blender::math {
/**
* Float (FP32) <-> Half (FP16) conversion functions.
*
* Behavior matches hardware (x64 F16C, ARM NEON FCVT),
* including handling of denormals, infinities, NaNs, rounding
* is to nearest even, etc. When NaNs are produced, the exact
* bit pattern might not match hardware, but it will still be a NaN.
*
* When compiling for ARM NEON (e.g. Apple Silicon),
* hardware VCVT instructions are used.
*
* For anything involving more than a handful of numbers,
* prefer #float_to_half_array and #half_to_float_array for
* performance.
*/
/**
* Converts float (FP32) number to half-precision (FP16).
*/
uint16_t float_to_half(float v);
/**
* Converts half-precision (FP16) number to float (FP32).
*/
float half_to_float(uint16_t v);
void float_to_half_array(const float *src, uint16_t *dst, size_t length);
void half_to_float_array(const uint16_t *src, float *dst, size_t length);
} // namespace blender::math