Handling HashMap with byte-string keys in Rust

Recently, I've tried to create a hash map whose key is a "byte string" in Rust.

A naive code would be something like this:

  let m = HashMap<Vec<u8>, V>

Note that using a borrowed type, like &[u8], as key is not a good idea. Of course, this does work but it would little bit inconvenient if you would like to use the key like a "string", e.g.,

  m.get("ascii string")

However, it will cause a compilation error.

Using Borrow trait -- failed

Type of key argument of most HashMap<K, V> instance methods is not K. For example,

  fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where K: Borrow<Q>, Q: Hash + Eq

  fn remove<Q: ?Sized>(&mut self, k: &Q) -> Option<V>
    where K: Borrow<Q>, Q: Hash + Eq

So, what is Borrow<Q>? Its definition is

  pub trait Borrow<Q>
      where Q: ?Sized
  {
      fn borrow(&self) -> &Q;
  }

This trait was introduced for handling both owned and borrowed versions of a type at once (see RFC#235). For instance, it enables that an item of HashMap<String, V> is accessed by String (owned version) and also &str (borrowed version). Before the introduction of this trait, people have to convert &str to String by to_string() method.

So, my problem is same; I would like to handle HashMap<Vec<u8>, V> without converting a &str key to Vec<u8> explicitly and this will be achieved by writing a Borrow<str> implementation for the key type, so easy job! ... However, it actually is impossible disappointingly. There is no safe way to borrow a str from Vec<u8> (maybe) because str must be valid UTF-8 string and hence not every Vec<u8> has corresponding representation on str.

Nevertheless, there is an unsafe way to accomplish, like this:

  use std::borrow::Borrow;
  use std::hash::{Hash,Hasher};

  #[derive(PartialEq, Eq)]
  pub struct ByteString(pub Vec<u8>);

  impl ByteString {
      pub unsafe fn as_str(&self) -> &str {
          let ByteString(ref s) = *self;
          std::str::from_utf8_unchecked(s.as_slice())
      }
  }

  impl Borrow<str> for ByteString {
      fn borrow(&self) -> &str {
          unsafe { self.as_str() }
      }
  }

  impl Hash for ByteString {
      fn hash<H>(&self, state: &mut H)
          where H: Hasher
      {
          unsafe { self.as_str().hash(state) }
      }
  }

Here, the key type Vec<u8> is wrapped as ByteString to add traits. Important point is that ByteString::as_str() function is implemented by using an unsafe method std::str::from_utf8_unchecked. This method skips a check whether the given byte sequence is valid UTF-8 or not. I checked that it works fine for several non UTF-8 byte sequences. Anyway, no one knows what will happen on this unsafe code.

Conclusion

It is a daydream to use a string to access an item of a hash map with byte string keys. My current answer is to add a method to convert str to ByteString as well as age before the Borrow trait. So sad.

#[derive(PartialEq, Eq, Hash)]
pub struct ByteString(pub Vec<u8>);

trait ToByteString {
    fn to_bytestring(&self) -> ByteString;
}

impl ToByteString for str {
    fn to_bytestring(&self) -> ByteString {
        ByteString(self.as_bytes().to_vec())
    }
}

I feel it is too restricted that "basic" strings have to be valid UTF-8 string in Rust.

 
comments powered by Disqus