2024-07-05

Why does Rust need Pin and Unpin?

this artical is from https://mayer-pu.medium.com/why-does-rust-need-pin-and-unpin-a50a3f2cc1e2

Using asynchronous Rust libraries is usually straightforward, much like working with regular Rust code, using .async or .await. However, writing your own asynchronous library can be challenging. There are some obscure and hard-to-understand syntax, such as T: Unpin and Pin<&mut Self>. Therefore, in this article, we will explain these syntax elements.

Self-referentiality is unsafe

The presence of Pin addresses a very specific issue: self-referential data types, referring to structures with pointers pointing to themselves. For instance, a binary search tree might have self-referential pointers directing to other nodes within the same structure.

Self-referential types can be highly useful, but ensuring memory safety with them is challenging. To understand the reasons, let’s use a type with two fields as an example: one named ‘val’ of type i32, and another named ‘pointer’ pointing to an i32.

So far, everything is normal. The pointer field points to the ‘val’ field at memory address A, which contains a valid i32. All pointers are valid, meaning they indeed point to memory that encodes values of the correct type (in this case, i32).

But the Rust compiler often moves values in memory. For instance, if this structure is passed to another function, it might be moved to a different memory address, or we use Box to place it on the heap. Alternatively, if this structure is within a Vec<MyStruct>, and we push more values, the Vec might exceed its capacity, necessitating the movement of its elements to a new, larger buffer.

When we move it, the fields of the structure change their addresses but not their values. So, the pointer field still points to address A, but there is no longer a valid i32 at address A. The data that used to be there has been moved to address B, and some other values may have been written there now, rendering the pointer invalid.

This is problematic — invalid pointers can, at best, lead to crashes, and at worst, they can result in exploitable vulnerabilities. We should be extremely cautious when dealing with this type and inform users to update pointers after any movement.

Unpin and !Unpin

In summary, all Rust types fall into two categories:

Types that can be safely moved in memory. This is the default and the norm. Examples include primitives like numbers, strings, booleans, as well as structures or enums composed entirely of them. Most types belong to this category!
Self-referential types, where moving in memory is unsafe. This is very rare, with examples being some intrusive linked lists within the internals of Tokio or most implementations of Future that also borrow the data.

Types in category 1 can be moved in memory safely, and moving pointers does not invalidate them. However, if you move a type in category 2, the pointers become invalid and may lead to undefined behavior, as we’ve seen before. In earlier versions of Rust, you had to be very careful with these types, avoiding their movement or, if moved, using unsafe and updating all pointers. However, starting from Rust 1.33, the compiler can automatically determine which category any type belongs to and ensure that you only use it safely.

Any type in category 1 automatically implements a special trait called Unpin. The name might seem odd, but its meaning becomes clear quickly. Similarly, most “normal” types implement Unpin because it is an automatically implemented trait (like Send, Sync, or Sized), so you don’t have to worry about implementing it yourself. If you’re unsure whether a type can be safely moved, simply check if it implements Unpin in the documentation.

Types in category 2 are creatively named !Unpin (! in a trait means “not implemented”). To use these types safely, conventional pointers for self-referentiality cannot be used. Instead, special pointers are employed to “pin” their values, ensuring they cannot be moved—this is precisely what the Pin type accomplishes.

Pin encapsulates pointers and prevents their values from being moved, with the sole exception being if the value contains Unpin, in which case we know that moving is safe. Now we can safely write self-referential structures! This is crucial because, as discussed above, many futures are self-referential, and we need them to implement async/await.

Using Pin

Now that we understand why Pin exists and why our Future’s poll method has a pinned &mut self instead of a regular &mut self, let’s go back to the previous question: I need a pinned reference to an internal Future. More generally: given a pinned struct, how do we access its fields?

The solution is to write helper functions that provide you with references to the fields. These references can be regular Rust references, such as &mut, or they can also be pinned. You can choose whichever you need. This is what’s called projection: if you have a pinned struct, you can write a projection method that allows you to access all its fields.

Projection is essentially converting data to and from Pin types. For example, we obtain the start: Option<Duration> field from Pin<&mut self>, and we need to place future: Fut into Pin so that we can call its poll method. If you read the Pin methods, you’ll find that it is always safe if it points to an Unpin value; otherwise, it requires the use of Unsafe.

// Put data into Pin.
pub        fn new<P: Deref<Target:Unpin>>(pointer: P) -> Pin<P>;
pub unsafe fn new_unchecked<P>(pointer: P) -> Pin<P>;

// Get data from Pin.
pub        fn into_inner<P: Deref<Target: Unpin>>(pin: Pin<P>) -> P;
pub unsafe fn into_inner_unchecked<P>(pin: Pin<P>) -> P;

Using the pin-project library

For each field in the struct, you must decide whether its reference should be pinned. By default, regular references should be used, as they are easier and simpler. However, if you know that you need a pinned reference — perhaps because you want to call .poll(), and its receiver is Pin<&mut Self>—then you can use #[Pin].

Here’s an example:

Add the pin-project dependency in the Cargo.toml file:

[dependencies]
pin-project = "1.1.3"

In src/main.rs, write the following code:

#[pin_project::pin_project]
pub struct TimedWrapper<Fut: Future> {
    // For each field, we need to decide whether to return an unpinned (&mut) reference to that field
    // or a pinned (Pin<&mut >) reference.
    // By default, it is unpinned.
    start: Option<Instant>,
    // This attribute selects a pinned reference.
    #[pin]
    future: Fut,
}

The poll method is implemented as follows:

impl<Fut: Future> Future for TimedWrapper<Fut> {
    type Output = (Fut::Output, Duration);

    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        // This will return a type with all the same fields, 
        // except the ones defined with #[pin] will be pinned.
        let mut this = self.project();

        // Call the internal poll, measuring how long it takes.
        let start = this.start.get_or_insert_with(Instant::now);
        let inner_poll = this.future.as_mut().poll(cx);
        let elapsed = start.elapsed();

        match inner_poll {
            // The internal Future needs more time, so this Future also needs more time
            Poll::Pending => Poll::Pending,
            // Success!
            Poll::Ready(output) => Poll::Ready((output, elapsed)),
        }
    }
}

In the end, our goal is achieved — we have accomplished all of this without any unsafe code.

Summary

If a Rust type has self-referential pointers, it cannot be moved safely. After all, moving does not update the pointers, so they still point to the old memory address, rendering them invalid.

Rust can automatically determine which types can be safely moved (and automatically implement the Unpin trait for them). If you have a Pin pointer pointing to some data, Rust can guarantee that nothing unsafe will happen. This is crucial because many Future types are self-referential, so we need Pin to safely poll Futures. You can use the pin-project crate to simplify these operations.