Skip to content

Conversation

@zbraniecki
Copy link
Owner

@zbraniecki zbraniecki commented Aug 8, 2019

I separated out the new helper crate from #7, applied @raphlinus feedback and added benchmarks.

@zbraniecki
Copy link
Owner Author

zbraniecki commented Aug 8, 2019

@raphlinus - would you be willing to review this?

@zbraniecki zbraniecki mentioned this pull request Aug 8, 2019
@zbraniecki zbraniecki force-pushed the tinystr2 branch 2 times, most recently from 6bab977 to 1ed30c8 Compare August 9, 2019 17:00
@zbraniecki zbraniecki requested a review from Manishearth August 9, 2019 18:32
@Manishearth
Copy link
Collaborator

@zbraniecki I was suggesting this be a separate repo 😄, I think this crate has utility outside of unic-locale

@Manishearth
Copy link
Collaborator

Also, crate name bikeshed: filament

(tinystr may be more searchable)

@zbraniecki
Copy link
Owner Author

Yep! I'd like to first get a review of this code, before I land it - but I'll land it in a separate repo.

(tinystr may be more searchable)

Yeaaa... I prefer searchability. I also am considering tinystring because it allocates.

@Manishearth
Copy link
Collaborator

Wait, how does it allocate?

@zbraniecki
Copy link
Owner Author

Wait, how does it allocate?

Does it not? It owns the data it stores, right?

@raphlinus
Copy link

raphlinus commented Aug 9, 2019

It shouldn't allocate, it uses a scalar. So it can happily be #[no_std].

Filament is a superb physically based rendering engine, but I understand the appeal.

@Manishearth
Copy link
Collaborator

Does it not? It owns the data it stores, right?

It stores things inline in an integer

@zbraniecki
Copy link
Owner Author

Ah, ok. Seems like my understanding of terminology is fuzzy :( Sorry for that! tinystr is good.

I have one remaining open item from @raphlinus review in the previous issue:

(about the unchecked constructor) One thing to watch out for is endianness: I recommend the TinyStr constructor always take the string in little-endian, so do .from_le() to swap on a BE system. It's very difficult to test on big-endian systems, as they're rare (but do exist).

I'm not sure how to do that - I'm not familiar with BE/LE systems so I also don't know how to test it.

@raphlinus
Copy link

I'm not sure how to do that - I'm not familiar with BE/LE systems so I also don't know how to test it.

I'm willing to desk-check, but here's the basic idea. The struct always stores in "native endianness" so that the transmute to &str is zero cost. Thus, "abcd" is 0x64636261 on LE systems and 0x61626364 on BE. What I'm recommending is that we always use LE in the API, so the constructor takes 0x64636261 in both cases, calls .from_le() on it which is a no-op on LE systems, but swaps bytes on BE.

A similar strategy is used to compute the various masks testing for letter classes, etc. This will have a small performance impact on BE systems, but makes it a lot easier to reason about.

@zbraniecki
Copy link
Owner Author

@raphlinus - thank you! Does it mean I should also do to_le() in to_raw_parts() ?

@raphlinus
Copy link

Yes, if (I haven't dug in) the intent is to provide a value that will be used by the constructor. The important thing is that .to_le() and .from_le() are balanced. For your macro case, it should all compile out even on BE machines, as these methods are const fn.

@zbraniecki
Copy link
Owner Author

Great! I added it to dumping and the unchecked constructor. I'll document it once we get to the docs phase.

@raphlinus - would you have time to skim through the code and verify if it looks like you intended it to look like? I'll then separate it to a new repo and try to maintain although a lot of the math in it is above my skill level :)

Copy link

@raphlinus raphlinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, some endian fixups.

mask: u32,
) -> Result<NonZeroU32, Error> {
// Mask is always supplied as little-endian.
let mask = mask.to_le();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An incredibly minor nit, but this should be from_le. The semantics are the same, but it communicates the intent of converting from a little-endian to a native-endian representation.

let word = self.0.get().to_le();
let mask = ((word + 0x3f3f_3f1f) & !(word + 0x2525_2505) & 0x8080_8080) >> 2;
let result = (word | mask) & !(0x20 & mask);
unsafe { TinyStr4(NonZeroU32::new_unchecked(result.to_le())) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this should be from_le as well, for the same reason.

}
}

impl Into<u32> for TinyStr4 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear whether this is intended to convert into the native or portable (little-endian) form. Code as written is native.

let mask = 0x80808080_80808080u64 >> (8 * (8 - len));
// TODO: could do this with #cfg(target_endian), but this is clearer and
// more confidence-inspiring.
let mask = mask.to_le();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_le

}

impl Into<u64> for TinyStr8 {
fn into(self) -> u64 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@Manishearth
Copy link
Collaborator

Overall seems good to me aside from the endianness stuff

@zbraniecki
Copy link
Owner Author

Moved to a separate repo: https://github.com/zbraniecki/tinystr

@zbraniecki zbraniecki closed this Aug 10, 2019
@zbraniecki zbraniecki deleted the tinystr2 branch October 29, 2019 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants