Recently, I've tried to create a hash map whose key is a "byte string" in Rust.
A naive code would be something like this:
let m = HashMap<Vec<u8>, V>
Note that using a borrowed type, like &[u8]
, as key is not a good idea. Of course, this does work but it would little bit inconvenient if you would like to use the key like a "string", e.g.,
m.get("ascii string")
However, it will cause a compilation error.
Using Borrow trait -- failed
Type of key argument of most HashMap<K, V>
instance methods is not K
. For example,
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where K: Borrow<Q>, Q: Hash + Eq
fn remove<Q: ?Sized>(&mut self, k: &Q) -> Option<V>
where K: Borrow<Q>, Q: Hash + Eq
So, what is Borrow<Q>
? Its definition is
pub trait Borrow<Q>
where Q: ?Sized
{
fn borrow(&self) -> &Q;
}
This trait was introduced for handling both owned and borrowed versions of a type at once (see RFC#235). For instance, it enables that an item of HashMap<String, V>
is accessed by String
(owned version) and also &str
(borrowed version). Before the introduction of this trait, people have to convert &str
to String
by to_string()
method.
So, my problem is same; I would like to handle HashMap<Vec<u8>, V>
without converting a &str
key to Vec<u8>
explicitly and this will be achieved by writing a Borrow<str>
implementation for the key type, so easy job! ... However, it actually is impossible disappointingly. There is no safe way to borrow a str
from Vec<u8>
(maybe) because str
must be valid UTF-8 string and hence not every Vec<u8>
has corresponding representation on str
.
Nevertheless, there is an unsafe way to accomplish, like this:
use std::borrow::Borrow;
use std::hash::{Hash,Hasher};
#[derive(PartialEq, Eq)]
pub struct ByteString(pub Vec<u8>);
impl ByteString {
pub unsafe fn as_str(&self) -> &str {
let ByteString(ref s) = *self;
std::str::from_utf8_unchecked(s.as_slice())
}
}
impl Borrow<str> for ByteString {
fn borrow(&self) -> &str {
unsafe { self.as_str() }
}
}
impl Hash for ByteString {
fn hash<H>(&self, state: &mut H)
where H: Hasher
{
unsafe { self.as_str().hash(state) }
}
}
Here, the key type Vec<u8>
is wrapped as ByteString
to add traits. Important point is that ByteString::as_str()
function is implemented by using an unsafe method std::str::from_utf8_unchecked
. This method skips a check whether the given byte sequence is valid UTF-8 or not. I checked that it works fine for several non UTF-8 byte sequences. Anyway, no one knows what will happen on this unsafe code.
Conclusion
It is a daydream to use a string to access an item of a hash map with byte string keys. My current answer is to add a method to convert str
to ByteString
as well as age before the Borrow
trait. So sad.
#[derive(PartialEq, Eq, Hash)]
pub struct ByteString(pub Vec<u8>);
trait ToByteString {
fn to_bytestring(&self) -> ByteString;
}
impl ToByteString for str {
fn to_bytestring(&self) -> ByteString {
ByteString(self.as_bytes().to_vec())
}
}
I feel it is too restricted that "basic" strings have to be valid UTF-8 string in Rust.
comments powered by Disqus