-
Notifications
You must be signed in to change notification settings - Fork 14
Fix str encode in ractors #674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix str encode in ractors #674
Conversation
encoding.c
Outdated
size_t input_len = strlen(name); | ||
switch(input_len) { | ||
case 5: | ||
if (strncmp(name, string_UTF_8, 5) == 0) { | ||
return ENCINDEX_UTF_8; | ||
} | ||
case 8: | ||
if (strncmp(name, string_US_ASCII, 8) == 0) { | ||
return ENCINDEX_US_ASCII; | ||
} | ||
case 10: | ||
if (strncmp(name, string_ASCII_8BIT, 10) == 0) { | ||
return ENCINDEX_ASCII_8BIT; | ||
} | ||
default: | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? If one care about perf, they're use Encoding
objects (e.g. Encoding::UTF_8
), not encoding names. Also all these encodings have tons of aliases so I'm not even sure this opt will match often.
>> Encoding.find("utf-8")
=> #<Encoding:UTF-8>
>> Encoding.find("UTF-8")
=> #<Encoding:UTF-8>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out it was going through this path fairly often because when doing any kind of transcoding, only the names of the encodings are passed, not the rb_encoding
s . I added an internal API (static functions) that does allow passing encodings now, so this should be less necessary. However, other files still reference these other functions, so this is still a win imo, especially with ractors.
Edit: I changed it to use STRCASECMP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to use STRCASECMP
I still really don't like this. We should fix whatever API is calling this, to pass a rb_encoding *
instead.
It's not just about capitalization:
>> Encoding.find("ascii")
=> #<Encoding:US-ASCII>
>> Encoding.find("us-ascii")
=> #<Encoding:US-ASCII>
I understand that this short circuit pays off, but it's really not clean.
e7c1df8
to
ff72419
Compare
@Shopify/byroot It's ready for a re-review. I split my original branch into 2, this PR now includes only deadlock fixes (no perf gains). The other branch is built on top of this and includes perf. gains. I will make a PR for the other branch when it's ready if this gets merged. 🙇 |
@@ -6684,14 +6690,15 @@ env_shift(VALUE _) | |||
VALUE result = Qnil; | |||
VALUE key = Qnil; | |||
|
|||
rb_encoding *enc = env_encoding(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So env_encoding
calls rb_locale_encoding
which calls rb_locale_encindex()
, which acquire the VM lock, so this isn't helping much.
I'd start by making rb_locale_encindex()
lock-free in most case. A simple atomic check to see if the locale encoding was initialized should suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was more that env_encoding
could load the encoding, and that needed to be done outside the VM lock or else it deadlocks but I'll look into making it atomic 👍 It shouldn't load the encoding often, but native exts could call setlocale()
in which case it could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. That makes sense.
Perhaps it doesn't matter, as this is only used when accessing ENV
, which hopefulyl wouldn't be in a hotspot. But yeah, the amount of work done every time is surprising.
I'll look into making it atomic
No need. Let's make if safe. We'll see about optimizing later.
51fe205
to
c5b37eb
Compare
Make sure VM lock is not held when calling `load_transcoder_entry`, as that causes deadlock inside ractors. `String#encode` now works inside ractors, among others.
c5b37eb
to
1c9856c
Compare
Without this, wbcheck would sometimes hit a missing write barrier. Co-authored-by: John Hawthorn <[email protected]>
49d1da5
to
6f5a722
Compare
This st_table can be inserted into at runtime when autoloading encodings.
Author is @luke-gruber, opening a draft to ease review.