Skip to content

Conversation

ChrisDenton
Copy link
Member

As a minor optimization, we can skip the runtime UTF-8 to UTF-16 conversion.

@rustbot
Copy link
Collaborator

rustbot commented Apr 6, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 6, 2024
Comment on lines +81 to +131
/// Const convert UTF-8 to UTF-16, for use in the wide_str macro.
///
/// Note that this is designed for use in const contexts so is not optimized.
pub const fn to_utf16<const UTF16_LEN: usize>(s: &str) -> [u16; UTF16_LEN] {
let mut output = [0_u16; UTF16_LEN];
let mut pos = 0;
let s = s.as_bytes();
let mut i = 0;
while i < s.len() {
match s[i].leading_ones() {
// Decode UTF-8 based on its length.
// See https://en.wikipedia.org/wiki/UTF-8
0 => {
// ASCII is the same in both encodings
output[pos] = s[i] as u16;
i += 1;
pos += 1;
}
2 => {
// Bits: 110xxxxx 10xxxxxx
output[pos] = ((s[i] as u16 & 0b11111) << 6) | (s[i + 1] as u16 & 0b111111);
i += 2;
pos += 1;
}
3 => {
// Bits: 1110xxxx 10xxxxxx 10xxxxxx
output[pos] = ((s[i] as u16 & 0b1111) << 12)
| ((s[i + 1] as u16 & 0b111111) << 6)
| (s[i + 2] as u16 & 0b111111);
i += 3;
pos += 1;
}
4 => {
// Bits: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
let mut c = ((s[i] as u32 & 0b111) << 18)
| ((s[i + 1] as u32 & 0b111111) << 12)
| ((s[i + 2] as u32 & 0b111111) << 6)
| (s[i + 3] as u32 & 0b111111);
// re-encode as UTF-16 (see https://en.wikipedia.org/wiki/UTF-16)
// - Subtract 0x10000 from the code point
// - For the high surrogate, shift right by 10 then add 0xD800
// - For the low surrogate, take the low 10 bits then add 0xDC00
c -= 0x10000;
output[pos] = ((c >> 10) + 0xD800) as u16;
output[pos + 1] = ((c & 0b1111111111) + 0xDC00) as u16;
i += 4;
pos += 2;
}
// valid UTF-8 cannot have any other values
_ => unreachable!(),
}
}
output
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work. I feel like at least some of this should be using more public std API instead of a bunch of sorcerous isopsephia, but I looked for equivalents and couldn't find any in the stdlib, so this will do for now.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with comment

@ChrisDenton
Copy link
Member Author

@bors r=workingjubilee

@bors
Copy link
Collaborator

bors commented Apr 9, 2024

📌 Commit 614e793 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 9, 2024
@workingjubilee
Copy link
Member

Okay, following fmease's explanation I think using a decl macro would be fine since we're std and get to use nightly features when we want:
@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 9, 2024
`wide_str!` creates a null terminated UTF-16 string whereas `utf16!` just creates a UTF-16 string without adding a null.
@ChrisDenton
Copy link
Member Author

Ok, I've rewritten it to use macros 2.0. I did the same for both macros for the sake of consistency.

@workingjubilee
Copy link
Member

Yay! ( I don't mean to be annoying, I just don't think we should embrace a fragile proliferation of underscores if we don't have to. )

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 9, 2024

📌 Commit 19f04a7 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 10, 2024
…llaumeGomez

Rollup of 7 pull requests

Successful merges:

 - rust-lang#118391 (Add `REDUNDANT_LIFETIMES` lint to detect lifetimes which are semantically redundant)
 - rust-lang#123534 (Windows: set main thread name without re-encoding)
 - rust-lang#123659 (Add support to intrinsics fallback body)
 - rust-lang#123689 (Add const generics support for pattern types)
 - rust-lang#123701 (Only assert for child/parent projection compatibility AFTER checking that theyre coming from the same place)
 - rust-lang#123702 (Further cleanup cfgs in the UI test suite)
 - rust-lang#123706 (rustdoc: reduce per-page HTML overhead)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 38af5f9 into rust-lang:master Apr 10, 2024
@rustbot rustbot added this to the 1.79.0 milestone Apr 10, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Apr 10, 2024
Rollup merge of rust-lang#123534 - ChrisDenton:name, r=workingjubilee

Windows: set main thread name without re-encoding

As a minor optimization, we can skip the runtime UTF-8 to UTF-16 conversion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants