-
Notifications
You must be signed in to change notification settings - Fork 13.9k
std: Stabilize the std::hash module #19673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hashes should have the option of single pass implementations rather than ones with state and this doesn't seem like it's enough to provide that. |
|
Does single-pass mean that the hasher gets to see e.g. the whole array of bytes instead of getting a look at them one at a time? |
|
@thestinger I'm not quite sure what you mean by "single pass", could you elaborate? Implementations of |
|
@tbu-: Yes, it means it doesn't have to be a state machine. |
|
@alexcrichton: The fact that the default hash implementations don't have single pass implementations for contiguous blocks of memory is a major performance issue and should be addressed in the design. I don't see how it's ready to be stabilized if these issues haven't even been considered. |
|
I'm trying to drill down into something more specific because what you've said so far sounds like this is already possible with the PR. I would like to make your concerns more concrete because the initial design of the Specifically:
It would be helpful for you to be a little more concrete in your concerns to make sure that we can address them! |
src/libcollections/hash/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speified -> specified
|
@alexcrichton I believe that this is not possible due to the current standard implementation for any slice that just hashes the members separately. |
|
@alexcrichton: A single-pass implementation is more efficient than a stateful one even if it's only using a single call to |
0e649fa to
d4b85f3
Compare
|
@tbu- I agree that we can't generically hash a slice of bytes as just a call to @thestinger is this what you are referring to? I still don't understand your concerns about state because with the specialization of the hasher type parameter on the |
d4b85f3 to
c47f2ed
Compare
This commit aims to stabilize the `std::hash` module by standardizing on its
hashing interface while rationalizing the current usage with the `HashMap` and
`HashSet` types. The primary goal of this slight redesign is to separate the
concepts of a hasher's state from a hashing algorithm itself.
The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.
This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:
trait Hasher: Writer {
type Output;
fn reset(&mut self);
fn finish(&self) -> Output;
}
Note that the `Output` associated type is currently a type parameter due to
associated types not being fully implemented yet. This new `Hasher` trait
emphasizes that all hashers are sinks for bytes, and hashing algorithms may
produce outputs other than a `u64`, so a the output type is made generic.
With this definition, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an experimental addition for
now. The current definition looks like:
trait HashState {
type H: Hasher;
fn hasher(&self) -> H;
}
Note that the `H` associated type (along with its `O` output) are both type
parameters on the `HashState` trait due to the current limitations of associated
types. The purpose of this trait is to emphasize that the one piece of
functionality for implementors is that new instances of `Hasher` can be created.
This conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`,
not a `Hasher`.
Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.
Some other stability decision made for the `std::hash` module are:
* The name of the module, hash, is `#![stable]`
* The `Hash` and `Hasher` traits are `#[unstable]` due to type parameters that
want to be associated types.
* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
reexported in the `hash` module.
There are many breaking changes outlined above, and as a result this commit is
a:
[breaking-change]
c47f2ed to
085838c
Compare
Calls to I'm assuming the problem is that LLVM doesn't understand the semantics of pub trait MyHash<S: Writer = SipHasher> for Sized? {
fn myhash<'a>(&'a self, state: &mut S, msg: &mut Option<&'a [u8]>);
}
pub fn myhash<T: MyHash<SipHasher>>(value: &T) -> u64 {
let mut state = SipState::new();
let mut msg = None;
value.myhash(&mut state, &mut msg);
if let Some(tail_msg) = msg {
state.write(tail_msg);
};
state.finish()
}
// for primitive integer types
// impl<S: Writer> MyHash<S> for $ty
fn myhash<'a>(&'a self, state: &mut S, msg: &mut Option<&'a [u8]>) {
if (*self as $ty).to_le() == *self as $ty { unsafe {
let a = slice::from_raw_buf(mem::transmute(&self), mem::size_of::<$ty>());
let a_ptr = a.as_ptr();
match msg {
&Some(ref mut m) if a_ptr == m.as_ptr().offset(m.len() as int) => {
*msg = slice::from_raw_buf(mem::transmute(&m.as_ptr()),
m.len() + a.len());
}
_ => {
if let Some(msg_part) = *msg { state.write(msg_part); }
*msg = Some(a);
}
}
} } else {
let a: [u8, ..::core::$ty::BYTES] = unsafe {
mem::transmute((*self as $ty).to_le() as $ty)
};
if let Some(msg_part) = *msg { state.write(msg_part); }
state.write(a.as_slice());
*msg = None;
}
}I can only imagine it working efficiently with generous inlining and optimizations. Currently, the check for |
|
One piece of infrastructure that might be relevant is a function for combining adjacent slices that I am working on. It is here in the playpen for now; I intend to write a PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Sized??
|
I've looked into some other languages, and it seems that the common trend is for a definition that looks something like: trait Hash {
fn hash(&self) -> uint;
}The drawback of this, however, is that containers like impl Hash<uint> for MyType {
fn hash(&self, state: &mut uint) {
*state = self.my_hash();
}
}When it comes to optimizing hashing or trying to make an implementation that doesn't work incrementally (which it sounds like @pczarn, @thestinger are alluding to), one of the problems that jumps out is how to deal with aggregate structures with From what I can tell, one can always specialize hashing to perform optimally for any one particular type (minimizing @pczarn I'm sorry I didn't quite follow your comment, but are you basically saying that requiring |
|
Bump. |
|
I'd like to rebase this with a rewrite with true associated types, but I'm hitting a number of errors in the compiler which are preventing the usage of associated types. @nikomatsakis has a fix though, and I'll rebase as soon as we have a snapshot with those fixes. |
This commit aims to stabilize the
std::hashmodule by standardizing on itshashing interface while rationalizing the current usage with the
HashMapandHashSettypes. The primary goal of this slight redesign is to separate theconcepts of a hasher's state from a hashing algorithm itself.
The primary change of this commit is to separate the
Hashertrait into aHasherand aHashStatetrait. Conceptually the oldHashertrait wasactually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old
Hashertrait wasactually fairly unrelated to hashing.
This commit redesigns the existing
Hashertrait to match what the notion of aHashernormally implies with the following definition:Note that the
Outputassociated type is currently a type parameter due toassociated types not being fully implemented yet. This new
Hashertraitemphasizes that all hashers are sinks for bytes, and hashing algorithms may
produce outputs other than a
u64, so a the output type is made generic.With this definition, the old
Hashertrait is realized as a newHashStatetrait in the
collections::hash_statemodule as an experimental addition fornow. The current definition looks like:
Note that the
Hassociated type (along with itsOoutput) are both typeparameters on the
HashStatetrait due to the current limitations of associatedtypes. The purpose of this trait is to emphasize that the one piece of
functionality for implementors is that new instances of
Hashercan be created.This conceptually represents the two keys from which more instances of a
SipHashercan be created, and aHashStateis what's stored in aHashMap,not a
Hasher.Implementors of custom hash algorithms should implement the
Hashertrait, andonly hash algorithms intended for use in hash maps need to implement or worry
about the
HashStatetrait.Some other stability decision made for the
std::hashmodule are:#![stable]HashandHashertraits are#[unstable]due to type parameters thatwant to be associated types.
Writertrait remains#[experimental]as it's intended to be replacedwith an
io::Writer(more details soon).hashfunction is#[unstable]as it is intended to be genericover the hashing algorithm instead of hardwired to
SipHashersipmodule is now private as its one export,SipHasherisreexported in the
hashmodule.There are many breaking changes outlined above, and as a result this commit is
a:
[breaking-change]