BLI_snprintf and related functions could truncate partial UTF8
code-points, which would then cause problems elsewhere -
Python raises an exception when accessing for example.
Existing uses of BLI_snprintf should use the UTF8 versions in most
cases, except for file paths which are not required to be UTF8.
A non UTF8 title on Wayland causes disconnection from the server.
Resolve by using the path as-is unless invalid UTF8 byte sequences
are found, in that case they're replaced by `?`.
This is skipped on WIN32 & macOS since the issue only exists on Wayland.
Similar to BLI_str_utf8_invalid_strip except that it substitutes
invalid characters and doesn't change the string length.
Useful for displaying strings that include invalid UTF8 code-points.
This commit enables Blender 4.5 to use (to some extent) blendfiles from
Blender 5.0 and later using 'long' ID names (i.e. ID names over 63 bytes).
On a more general perspective, it also introduces safer handling of
potentially corrupted ID names in a blendfile.
This is achieved by carefully checking for non-null terminated ID names
early on in readfile process, and then:
* Truncating and ensuring uniqueness of ID names.
* Doing similar process for action slot and slot users identifiers.
* In linking (and appending) context, such IDs are totally ignored. They are
not listed, and are considered as missing if some other (valid) linked ID
attempt to indirectly link them).
* Informing users through usual reporting ways.
Technically, this mainly changes two areas of the readfile code related to IDs
themselves:
* The utils `blo_bhead_id_name` that returns the ID name of an ID BHead,
without actually reading that ID, now check for a valid null-terminated
string of `MAX_ID_NAME` max size, and returns a `nullptr` on error.
_This essentially prevents listing and linking such IDs, in any way._
* The actual ID reading code (`read_id_struct`) does the same check, and
truncate the ID name to its maximum allowed length.
* Both of above checks also set a new FileData flag
(`FD_FLAGS_HAS_LONG_ID_NAME`), which is used to ensure that ID names (and
related actions slots identifiers) remain unique, and report to info to the
user.
Implements #137608.
Branched out from !137196.
Co-authored-by: michal.krupa <michal.krupa@cdprojektred.com>
Co-authored-by: Campbell Barton <ideasman42@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/139336
This PR adds new typographical line breaking opportunities, characters
that break depending on what follows. For example comma if not followed
by a space or a number. Period if not followed by a space or another
period. Along with related non-Latin equivalents, like Greek question
mark, full width punctuation, Hebrew, Arabic, etc.
Pull Request: https://projects.blender.org/blender/blender/pulls/137934
The PR allows specifying the method of line breaking. Current functions
gain an optional argument that defaults to "Minimal" which breaks as it
does now, only on space and line feed. "Typographical" breaks on
different types of spaces, backslashes, underscore, forward slash if
not following a number, dashes if not following a space, Chinese,
Japanese, and Korean Ideograms, and Tibetan intersyllabic marks. "Path"
breaks on space, underscore, and per-platform path separators. There is
also a "HardLimit" flag that forces breaking at character boundary at
the wrapping limit.
Pull Request: https://projects.blender.org/blender/blender/pulls/135203
Previously, there was a `StringRef.copy` method which would copy the string into
the given buffer. However, it was not defined for the case when the buffer was
too small. It moved the responsibility of making sure the buffer is large enough
to the caller.
Unfortunately, in practice that easily hides bugs in builds without asserts
which don't come up in testing much. Now, the method is replaced with
`StringRef.copy_utf8_truncated` which has much more well defined semantics and
also makes sure that the string remains valid utf-8.
This also renames `unsafe_copy` to `copy_unsafe` to make the naming more similar
to `copy_utf8_truncated`.
Pull Request: https://projects.blender.org/blender/blender/pulls/133677
The length checking wasn't accounting for null bytes within multi-byte
sequences and could step over the null bytes.
For BLI_strlen_utf8 this could result in an out of bounds read.
In practice most UTF8 data is validated so the extra checks
are mainly to prevent errors on invalid or corrupt UTF8 text.