- 
                Notifications
    
You must be signed in to change notification settings  - Fork 303
 
don't use --config=mkl for TensorFlow 2.4+ #2583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't use --config=mkl for TensorFlow 2.4+ #2583
Conversation
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 1 out of 1 (1 easyconfigs in total)  | 
    
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 0 out of 1 (1 easyconfigs in total)  | 
    
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 0 out of 1 (1 easyconfigs in total)  | 
    
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 1 out of 1 (1 easyconfigs in total)  | 
    
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 1 out of 1 (1 easyconfigs in total)  | 
    
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 2 out of 2 (1 easyconfigs in total)  | 
    
| 
           Test report by @Flamefire Overview of tested easyconfigs (in order)
 Build succeeded for 2 out of 2 (1 easyconfigs in total)  | 
    
| 
           More or less blocked by tensorflow/tensorflow#52151  | 
    
| 
           I consider this change important enough to not let it be blocked by the failing  Especially since this apparently fixes two performance issues: the threading oversubscription for CPU-only TensorFlow reported in #2577, but also   | 
    
| 
           Test report by @boegel Overview of tested easyconfigs (in order)
 Build succeeded for 3 out of 3 (3 easyconfigs in total)  | 
    
| 
           Test report by @boegel Overview of tested easyconfigs (in order)
 Build succeeded for 1 out of 3 (3 easyconfigs in total)  | 
    
| 
           Failing tests on systems with AMD CPUs due to tensorflow/tensorflow#52151, I'll open a PR to filter out those that broken test so we can proceed here...  | 
    
| 
           I get these two failing for TensorFlow-2.6.0-foss-2021a-CUDA-11.3.1.eb --include-easyblocks-from-pr 2583 Trying without this PR... And my original production build of it on that system combo did not fail and that was without this PR. Weird, I get that problem even without this PR now.  | 
    
| 
           @akesandgren I'd say let's open an issue on that?  | 
    
| 
           I will once I've figured out a bit more on what's happening... I think my container env wasn't fully correct for TF-CUDA when I initially built it, but now it is. It was previously lacking nvidia-smi and thus decided to skip the GPU tests. Something I only discovered today... And at least one of the failing tests may be due to too little memory on the K80, or that multiple tests are running at the same time and stealing memory from each other...  | 
    
| 
           Test report by @boegel Overview of tested easyconfigs (in order)
 Build succeeded for 3 out of 3 (3 easyconfigs in total)  | 
    
| 
           Test report by @boegel Overview of tested easyconfigs (in order)
 Build succeeded for 3 out of 3 (3 easyconfigs in total)  | 
    
| 
           @akesandgren Any additional updates? I would really like to move forward with this (but we also need easybuilders/easybuild-easyconfigs#14151 and easybuilders/easybuild-easyconfigs#14153 merged, first ideally).  | 
    
| 
           Yeah, most likely caused by K80's having too little memory. Works fine on our V100's so I'll just drop those two tests in our hooks when on K80's  | 
    
| 
           So in case I was unclear, the problem only appears on K80 and appears regardless of this PRs existance.  | 
    
| 
           With both easybuilders/easybuild-easyconfigs#14151 and easybuilders/easybuild-easyconfigs#14153 merged, I don't see any reason to hold this back any further...  | 
    
11264eb    to
    517129b      
    Compare
  
    | 
           Test report by @boegel Overview of tested easyconfigs (in order)
 Build succeeded for 2 out of 2 (2 easyconfigs in total)  | 
    
| 
           Is the subject still accurate here? Because we should do this for all 2.x versions in the CUDA builds.  | 
    
| 
           Test report by @boegel Overview of tested easyconfigs (in order)
 Build succeeded for 1 out of 1 (1 easyconfigs in total)  | 
    
(created using
eb --new-pr)fixes #2577, fixes easybuilders/easybuild-easyconfigs#14120