Skip to content

Conversation

@felixc-arm
Copy link
Contributor

@felixc-arm felixc-arm commented May 2, 2025

This bug caused errors like:

/usr/bin/ld: ../../drivers/builtin/libbuiltin.so: undefined reference to `psa_asymmetric_encrypt'
/usr/bin/ld: ../../drivers/builtin/libbuiltin.so: undefined reference to `psa_crypto_driver_pake_get_user'
collect2: error: ld returned 1 exit status
make[2]: *** [programs/test/CMakeFiles/which_aes.dir/build.make:101: programs/test/which_aes] Error 1
make[1]: *** [CMakeFiles/Makefile2:1127: programs/test/CMakeFiles/which_aes.dir/all] Error 2

When building with -DUSE_SHARED_TF_PSA_CRYPTO_LIBRARY=ON, e.g. the test component_test_tf_psa_crypto_shared in component-build-system.sh should have picked it up, I found it locally on Ubuntu 22.

It wasn't caught by the CI, presumably because the CI was running it on Ubuntu 16 and the older version of ld didn't notice it...

Originally introduced in #199

PR checklist

  • changelog not required because: Bug fix in 4.0/1.0 work, not present in releases
  • framework PR not required
  • mbedtls development PR not required because: tf-psa-crypto change only
  • mbedtls 3.6 PR not required because: tf-psa-crypto change only
  • tests Added a test that uses newer Ubuntu via a newer ld check as well as gcc so that this error will be caught in the future,

@felixc-arm felixc-arm added bug Something isn't working needs-review Every commit must be reviewed by at least two team members needs-reviewer This PR needs someone to pick it up for review priority-high High priority - will be reviewed soon size-xs Estimated task size: extra small (a few hours at most) needs-work needs-ci Needs to pass CI tests and removed needs-review Every commit must be reviewed by at least two team members needs-reviewer This PR needs someone to pick it up for review labels May 2, 2025
Signed-off-by: Felix Conway <[email protected]>
@felixc-arm felixc-arm added needs-review Every commit must be reviewed by at least two team members needs-reviewer This PR needs someone to pick it up for review and removed needs-work needs-ci Needs to pass CI tests labels May 2, 2025
felixc-arm added 5 commits May 6, 2025 12:09
Signed-off-by: Felix Conway <[email protected]>
Signed-off-by: Felix Conway <[email protected]>
Signed-off-by: Felix Conway <[email protected]>
@felixc-arm
Copy link
Contributor Author

The new job has failed as expected on commit ac140e5 : https://mbedtls.trustedfirmware.org/blue/organizations/jenkins/mbed-tls-tf-psa-crypto-multibranch/detail/PR-277-head/4/pipeline/1199 - so the new job works in catching the error, hopefully after applying the bug fix the CI (including this new job) will be green...

Copy link
Contributor

@valeriosetti valeriosetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the change introduced in the CMake file, but I have some doubt about the test component. I don't think we can force the Ubuntu/GCC/LD version to be used in the CI for a test component. Based on what I see most of the tests are run with u16 and few of them with higher versions (up to u22). If component_test_tf_psa_crypto_shared was meant to catch this problem, but it didn't I guess it's due to the fact that it's only tested in u16. What you should do is starting a Docker instance to test that component with u22 instead.
What you added here is a filter for component_build_tf_psa_crypto_shared_newer_ld_gcc, but since you cannot force the Docker version this component is tested on I think u16 will be used and therefore it will never be executed.
As check for my hypothesis I grep tf-psa-crypto-outcomes.csv artifact and I didn't found any reference to build_tf_psa_crypto_shared_newer_ld_gcc so I think it's correct.

@felixc-arm
Copy link
Contributor Author

@valeriosetti Hmm I think there's something strange going on in tf-psa-crypto-outcomes.csv - I can't see any results in that file for tests that run on non-16 Ubuntu (e.g. tf-psa-crypto-all_u18-tf_psa_crypto_check_code_style & tf-psa-crypto-all_u18-tf_psa_crypto_test_psa_compliance). *Actually I think it's probably because those components don't run the "actual" tests, they run python scripts that check other things, and the new component doesn't run tests either it just checks the build, so they don't have any test results that need to be put in that file.

I see the new test is also not in that file, however it is definitely being run on the CI, e.g. the last job before the windows jobs is tf-psa-crypto-all_u22-build_tf_psa_crypto_shared_newer_ld_gcc, and it also shows up in the timestamps.json artifact (hopefully the links don't die before I finish writing this...)

I was basing this work on a reply I got from @mpg:

If Ubuntu 22 is enough to reproduce the issue, we have it in the CI. If you can write an all.sh component the reproduces the issue, then you can force it to run on a recent enough Ubuntu by using a support_ function. You can have a look at existing support functions for inspiration. If we want to require Ubuntu >= 22 we can probably take inspiration from support_test_cmake_out_of_source but I guess here it would probably cleaner to request a recent enough version of ld if you've identified that to be the culprit.

So it's definitely running on Ubuntu 22 (also running & failing in the link in my first comment as I hadn't put in the CMake fix yet), but yes it is a bit counter-intuitive in the way you have to force it to use a newer OS or tool.

@valeriosetti
Copy link
Contributor

So it's definitely running on Ubuntu 22 (also running & failing in the link in my first comment as I hadn't put in the CMake fix yet), but yes it is a bit counter-intuitive in the way you have to force it to use a newer OS or tool.

Oh, you are right, I didn't see that line for tf-psa-crypto-all_u22-build_tf_psa_crypto_shared_newer_ld_gcc before (sorry!) so the test is definitely executed. Therefore my comment above is invalid.

I only have a minor proposal left: instead of naming the component build_tf_psa_crypto_shared_newer_ld_gcc can't we pick something more specific than newer? newer is really relative to the current status and it's not really future proof to me. Wdyt?

--- Question to other reviewers ---
I wonder how this actually works: I checked all-core.sh and it seems to me that the support_ function is acting as a "filter" as I mentioned before (it's used to generate the SUPPORTED_COMPONENTS list). So how can the CI determine which tests are to be executed in which Docker image? Or put differently, how can each Docker image >= 18 no re-execute tests which are already done for the u16 case?

# which_aes
tfpsacrypto_build_program_common(which_aes)
target_link_libraries(which_aes PRIVATE ${tfpsacrypto_target})
target_sources(which_aes PRIVATE $<TARGET_OBJECTS:tf_psa_crypto_test>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to not do that as this rather hides than fixes the issue. The core issue seems that we have two separate libraries, in the C compiler/linker sense, for the PSA core and the driver builtin that reference each other and it is linker dependent how this is handled. We should probably have only one library and in the CMake scripts to define the builtin library as an object library. I will create an issue for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the issue: #300

@mpg
Copy link
Contributor

mpg commented May 22, 2025

I wonder how this actually works: I checked all-core.sh and it seems to me that the support_ function is acting as a "filter" as I mentioned before (it's used to generate the SUPPORTED_COMPONENTS list). So how can the CI determine which tests are to be executed in which Docker image? Or put differently, how can each Docker image >= 18 no re-execute tests which are already done for the u16 case?

The magic for that lives in the mbedtls-test repo, specifically this function. Basically it runs all.sh --list-all-components somewhere to get a complete list of components, then it has an ordered list of platforms, runs all.sh --list-components on each platform to know what each platforms supports, then for each component it picks the first platform in the list that supports it.

That's most of what the pre-test-checks stage does (after loading and/or rebuilding the Docker images, and before running the actual tests). If one component from --list-all-components has no platform that supports it, the run is aborted with an error. Otherwise, we move to the testing stage where for each component, all.sh <component_name> is executed on the platform that was determined for this component earlier.

These days the list has Ubuntu 16 at the front, so all all.sh components run on Ubuntu 16 unless their support function tells it's not possible, then other platforms from the list are tried, until a suitable one is found - in this case Ubuntu 22 will be the first one.

(This is for Linux, I think FreeBSD is handled differently: it runs a hardcoded subset of all.sh that also runs on Linux. Also, even on Linux there are some more subtleties, like components with _release_ in their name being excluded from PR jobs, etc.)

I hope that helps. Feel free to browse the code and/or ask more questions obviously!

@mpg
Copy link
Contributor

mpg commented May 22, 2025

Actually I think it's probably because those components don't run the "actual" tests, they run python scripts that check other things, and the new component doesn't run tests either it just checks the build, so they don't have any test results that need to be put in that file.

Yes, currently only test results are recorded in the outcomes file, not build results, which is annoying for all kinds of reasons, including this. As usual, Gilles has a PR for that, which we have not been reviewing so it's not been merged and we keep running in the issue... If you feel like reviewing it, it would help :)

@gilles-peskine-arm
Copy link
Contributor

Hmm I think there's something strange going on in tf-psa-crypto-outcomes.csv - I can't see any results in that file for tests that run on non-16 Ubuntu (e.g. tf-psa-crypto-all_u18-tf_psa_crypto_check_code_style & tf-psa-crypto-all_u18-tf_psa_crypto_test_psa_compliance). *Actually I think it's probably because those components don't run the "actual" tests, they run python scripts that check other things, and the new component doesn't run tests either it just checks the build, so they don't have any test results that need to be put in that file.

Right. Currently the outcome file only records outcomes from running unit tests, ssl-opt.sh and compat.sh. Other things, such as builds, are not recorded (there's a stalled PR for that).

I checked all-core.sh and it seems to me that the support_ function is acting as a "filter" as I mentioned before

Right, this is documented in all.sh (and only there I think, it would be useful to have a more comprehensive and discoverable document somewhere).

fi

# Newer lds with gcc cause extra errors that wouldn't be caught with older versions
[ "$distrib_id" != "Ubuntu" ] || [ "$ld_version_minor" -gt 37 ]
Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the exact Linux distribution matter? As far as I understand, there are three possibilities:

  • Recent GNU binutils — good.
  • Old GNU binutils — bad.
  • Non-GNU ld — good? Or bad, since we can't be sure they would catch the problem? It's a bit of a moot point anyway since we run every component on Ubuntu (and even if we create non-Ubuntu components one day, we'd still keep Ubuntu as a possibility).

If we just want a recent enough GNU binutils, given that we can assume Ubuntu, we can just check the version of binutils.

support_build_tf_psa_crypto_shared_newer_ld_gcc () {
    # Require a recent enough GNU binutils, because older ones are more
    # permissive and would not detect e.g.
    # https://github.com/Mbed-TLS/TF-PSA-Crypto/issues/300
    dpkg --compare-versions "$(dpkg-query --show --showformat '${Version}' binutils)" '>=' 2.38 2>/dev/null
}

@mpg
Copy link
Contributor

mpg commented Aug 4, 2025

Is it still relevant or can it be closed now that #300 (which according to Ronald was the root of the problem) has been resolved?

@felixc-arm felixc-arm closed this Aug 4, 2025
@felixc-arm
Copy link
Contributor Author

Yep shouldn't be needed any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-review Every commit must be reviewed by at least two team members needs-reviewer This PR needs someone to pick it up for review priority-high High priority - will be reviewed soon size-xs Estimated task size: extra small (a few hours at most)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants