Files
test2/build_files/build_environment/utils/strip_libraries.py

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

69 lines
2.1 KiB
Python
Raw Normal View History

Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
#!/usr/bin/env python3
# SPDX-FileCopyrightText: 2025 Blender Authors
#
# SPDX-License-Identifier: GPL-2.0-or-later
"""
Script which strips all libraries in the given library directory.
This is so we don't keep any debug data or symbols that contains
random hashes that are not reproducible between builds.
This will strip both static and shared libraries.
Usage:
strip_libraries.py <path/to/library/directory>
"""
import argparse
import subprocess
import sys
from pathlib import Path
2025-05-09 10:14:44 -04:00
Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
def print_strip_lib(strip_lib: Path, prev_print_len: int) -> int:
print_str = f"Stripping: {strip_lib}"
if prev_print_len > 0:
print(f"\r{' ' * prev_print_len}\r", end="")
print(print_str, end="", flush=True)
return len(print_str)
2025-05-09 10:14:44 -04:00
Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
def strip_libs(strip_dir: Path) -> None:
print(f"Stripping libraries in: {strip_dir}")
2025-05-09 10:14:44 -04:00
prev_print_len = 0
Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
for shared_lib in strip_dir.rglob("*.so*"):
if shared_lib.suffix == ".py":
2025-05-11 17:00:47 +10:00
# Work around badly named `sycl` scripts.
Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
continue
if shared_lib.is_symlink():
2025-05-11 17:00:47 +10:00
# Don't strip symbolic-links as we don't want to strip the same library multiple times.
Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
continue
prev_print_len = print_strip_lib(shared_lib, prev_print_len)
subprocess.check_call(["strip", "-s", "--enable-deterministic-archives", shared_lib])
for static_lib in strip_dir.rglob("*.a"):
if static_lib.is_symlink():
2025-05-11 17:00:47 +10:00
# Don't strip symbolic-links as we don't want to strip the same library multiple times.
Build: Make Linux Lib building reproducible There are two parts for this PR. One is to change some of our build pipeline to make certain libs reproducible. For this part I want to clarify two things: 1. Why change python to use `--disable-optimizations`? This is because `--enable-optimizations` turns on PGO (Profile Guided Optimization). PGO is sadly not deterministic and will create different binaries on every recompile. So to create reproducible build this needs to be turned off. This also seems to only have been turned on for Linux specifically(?) on our side. So on Windows and Mac our python build already doesn't have PGO. 2. Why split out cython and zstandard from site-packages? Sadly pip does not seem to respect `SOURCE_DATE_EPOCH`. It also creates temporary folders with random hashes in them that is then recorded into the Cython libraries (I'll touch on this again later). I've looked at the discussions about this upstream and sadly the pip maintainers do not really want people to use pip as a reproducible build system pipeline and instead directs users to other solutions if they want reproducible builds. The other part is about setting up our pipeline to not introduce any random hashes or build timestamps into our libraries. Here I do two things: 1. We need to set the `SOURCE_DATE_EPOCH` environmental variable to a specific date that will not change. This is needed as the compile time date is recorded in certain libraries and files. (So hard coding it with this env var will make the end result reproducible) 2. We need to strip the created static and shared libraries. This is because the static libraries are not created in a deterministic way. For shared libraries some of our libraries includes debug symbols which contains paths to temporary files with random hashes. To solve this without stripping in post, we would need to either patch the linker on Rocky8 or patch a lot of our libraries. I think it is better to just do this as a post build step. (This seems to be what most linux distributions do as well). With all this, we can make our Linux library builds is almost 100% reproducible. (At least on my machine where I tested) By almost, I mean that there is sadly a catch in that certain libraries like Cython saves the source code path in their libraries for error messages. However now the builds are reproducible if the folder path is the same. IE if the libraries are always built in `/home/builder/build_linux/deps_x64`, then they should now be reproducible. Pull Request: https://projects.blender.org/blender/blender/pulls/134221
2025-05-09 15:25:16 +02:00
continue
prev_print_len = print_strip_lib(static_lib, prev_print_len)
subprocess.check_call(["objcopy", "--enable-deterministic-archives", static_lib])
print("\nDone stripping libraries!")
def main() -> None:
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawTextHelpFormatter,
)
parser.add_argument("directory", type=Path, help="Path to the library directory to strip")
args = parser.parse_args()
if sys.platform == "linux":
strip_libs(args.directory)
if __name__ == "__main__":
main()