- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
rustc_codegen_llvm: give names to non-alloca variable values. #64149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| r? @zackmdavis (rust_highfive has picked a reviewer for you, use r? to override) | 
03d2d71    to
    01923d5      
    Compare
  
    | I'm excited to have fewer unnamed values. However, I'm not sure how I feel about squashing all names together in a (potentially) long string. While I can see it being sometimes valuable to immediately see that e.g. two source level variables are mapped to the same SSA values, it could also be irrelevant noise. At the same time, LLVM optimizations won't even try to do something similar (they'll just pick one of the multiple equal values, and not even necessarily preserve the name if transforming the instructions at all). While this limits the noise, it also means the "ah, these two variables got combined into one" effect is much less reliable when you look at IR after some LLVM optimizations, which in my experience is >95% of the time. I think it would be good enough if we kept using only one name, at least if together with other changes to ensure it is a meaningful name (i.e., avoid nonsense like the  | 
| I’m ambivalent. If this is helping @eddyb to do their debugging, I’m all for making this happen. I can always use  | 
| The reason I did this wasn't so much about reading LLVM IR, tbh, but rather I wanted to try out making something debuginfo-related more uniform (value names, in this case) between arguments and other variables. I suppose a more interesting thing to do here would be to add  If this is undesirable, I can probably continue without, or maybe just remove the second commit. | 
| The first commit is good IMO, both for IR readability and for debuginfo. If nobody cares strongly for the second commit then let's just drop the second one and merge the first. (Well, and probably rewrite the FIXME that presupposes we'll want to combine names). | 
01923d5    to
    eedf555      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me except one possible nit, if the "arg or inst" check can be an assertion please make it one.
| let old_name = unsafe { | ||
| CStr::from_ptr(llvm::LLVMGetValueName(value)) | ||
| }; | ||
| match old_name.to_str() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was gonna suggest writing this match as if old_name.is_empty() for readability (+ to avoid having to care about UTF-8) but actually CStr doesn't have that 🤷♀
FYI while annoyed about this I also noticed that we can use better APIs for touching LLVM value names (see #64223) but that's unrelated to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I forgot to bring that up! I also noticed those functions.
| llvm::LLVMIsAInstruction(value).is_some() | ||
| }; | ||
| if !param_or_inst { | ||
| return; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first instinct would be to make this a bug!(), not a silent nop. Are there any callers (now or after more debuginfo work) that might good reason need to pass in something other than an instruction or argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yeah, Constants (which would be a nop), or, worse, GlobalValues (functions/statics), which would get their mangled names replaced.
That is, I'm pretty sure let x = foo as fn(); and let y = &STATIC; would both allow accidental renaming of foo's or STATIC's symbol names without this return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. Well, silent nop it is then.
| @bors r+ | 
| 📌 Commit eedf555 has been approved by  | 
rustc_codegen_llvm: give names to non-alloca variable values.
These names only matter when looking at LLVM IR, but they can help.
When one value is used for multiple variables, I decided to combine the names.
I chose `,` as a separator but maybe `=` or ` ` (space) are more appropriate.
(LLVM names can contain any characters - if necessary they end up having quotes)
As an example, this function:
```rust
#[no_mangle]
pub fn test(a: u32, b: u32) -> u32 {
    let c = a + b;
    let d = c;
    let e = d * a;
    e
}
```
Used to produce this LLVM IR:
```llvm
define i32 @test(i32 %a, i32 %b) unnamed_addr #0 {
start:
  %0 = add i32 %a, %b
  %1 = mul i32 %0, %a
  ret i32 %1
}
```
But after this PR you get this:
```llvm
define i32 @test(i32 %a, i32 %b) unnamed_addr #0 {
start:
  %"c,d" = add i32 %a, %b
  %e = mul i32 %"c,d", %a
  ret i32 %e
}
```
cc @nagisa @rkruppe
    Rollup of 10 pull requests Successful merges: - #63919 (Use hygiene for AST passes) - #63927 (Filter linkcheck spurious failure) - #64149 (rustc_codegen_llvm: give names to non-alloca variable values.) - #64192 (Bail out when encountering likely missing turbofish in parser) - #64231 (Move the HIR CFG to `rustc_ast_borrowck`) - #64233 (Correct pluralisation of various diagnostic messages) - #64236 (reduce visibility) - #64240 (Include compiler-rt in the source tarball) - #64241 ([doc] Added more prereqs and note about default directory) - #64243 (Move injection of attributes from command line to `libsyntax_ext`) Failed merges: r? @ghost
These names only matter when looking at LLVM IR, but they can help.
When one value is used for multiple variables, I decided to combine the names.
I chose
,as a separator but maybe=or(LLVM names can contain any characters - if necessary they end up having quotes)
As an example, this function:
Used to produce this LLVM IR:
But after this PR you get this:
cc @nagisa @rkruppe