D6773: rust-hgpath: add HgPath and HgPathBuf structs to encapsulate handling of paths

Yuya Nishihara yuya at tcha.org
Sat Aug 31 01:04:28 EDT 2019


On Fri, 30 Aug 2019 14:38:47 +0000, Alphare (Raphaël Gomès) wrote:
> +    pub fn as_bytes(&self) -> &[u8] {
> +        unsafe { &*(&self.inner as *const _ as *const [u8]) }
> +    }

`&self.inner`. No unsafe trick is needed.

> +    /// Checks for errors in the path, short-circuiting at the first one.
> +    /// Useful to get finer-grained errors. To simply check if the path is
> +    /// valid, use `is_valid`.
> +    pub fn check_state(&self) -> Result<(), HgPathError> {
> +        if self.len() == 0 {
> +            return Ok(());
> +        }
> +        let bytes = self.as_bytes();
> +        let mut previous_byte = None;
> +
> +        if bytes[0] == b'/' {
> +            return Err(HgPathError::LeadingSlash);
> +        }
> +        for (index, byte) in bytes.iter().enumerate() {
> +            match byte {
> +                0 => return Err(HgPathError::ContainsNullByte(index)),
> +                b'/' => {
> +                    if previous_byte.is_some() && previous_byte == Some(b'/') {

`previous_byte.is_some()` is redundant.

> +                        return Err(HgPathError::ConsecutiveSlashes(index));
> +                    }
> +                }
> +                _ => (),
> +            };
> +            previous_byte = Some(*byte);
> +        }
> +        Ok(())
> +    }
> +    pub fn is_valid(&self) -> bool {
> +        self.check_state().is_ok()
> +    }

I'm not sure if this `HgPath` type is designed to be always validated or not.
If yes, these functions can be private, and we'll need a conservative
`HgPath::from_bytes(s) -> Result<&HgPath>` and `unsafe fn from_bytes_unchecked(s)`
functions. If not, maybe `HgPath::new()` shouldn't validate the bytes at all.

I feel `debug_assert()` implies that the condition must always be met,
so the caller has to guarantee that.

> +impl Index<usize> for HgPath {
> +    type Output = u8;
> +
> +    fn index(&self, i: usize) -> &Self::Output {
> +        &self.inner[i]
> +    }
> +}

[...]

Better to not implement `path[...]` since it can be considered returning
a path component at the specified index. We can always do
`path.as_bytes()[...]` explicitly.

> +#[derive(Debug)]
> +pub struct HgPathBytesIterator<'a> {
> +    path: &'a HgPath,
> +}

Can't we simply implement it as a `slice::Iter<>` wrapper?

> +    pub fn as_vec(&self) -> &Vec<u8> {
> +        &self.inner
> +    }
> +    pub fn as_ref(&self) -> &[u8] {
> +        self.inner.as_ref()
> +    }

Do we need `as_vec()`? I think `as_ref()` can be used in most cases.

> +impl<T: ?Sized + AsRef<HgPath>> From<&T> for HgPathBuf {
> +    fn from(s: &T) -> HgPathBuf {
> +        let new = s.as_ref().to_owned();
> +        debug_assert_eq!(Ok(()), new.check_state());
> +        new
> +    }

No need to do `new.check_state()`?
`s` should be already _checked_ if the type supports `s.as_ref()`.

> +impl TryInto<PathBuf> for HgPathBuf {
> +    type Error = std::io::Error;
> +
> +    fn try_into(self) -> Result<PathBuf, Self::Error> {
> +        let os_str;
> +        #[cfg(unix)]
> +        {
> +            use std::os::unix::ffi::OsStrExt;
> +            os_str = std::ffi::OsStr::from_bytes(&self.inner);
> +        }
> +        #[cfg(windows)]
> +        {
> +            // TODO: convert from Windows MBCS (ANSI encoding) to WTF8.
> +            // Perhaps, the return type would have to be Result<PathBuf>.
> +            unimplemented!();
> +        }
> +
> +        Ok(Path::new(os_str).to_path_buf())
> +    }

As I said, the encoding of HgPathBuf will be either MBCS or UTF-8 on
Windows, and the encoding will be selected at runtime. We can't define
a single conversion between HgPathBuf and PathBuf/OsString.


More information about the Mercurial-devel mailing list