Downcast any phantom data

Yet another data processing orchestrator

I'm working on a framework that has the goal of making data processing pipelines simpler. My first use case will be data extraction from web pages. But first of all, I want to improve my Rust skills. I want to explore how the type system can be used to make the maintainance of such pipelines simpler and more stable.

Yesterday, I found a solution for one of my challenges. It involves the Any trait, its downcast_ref method and some PhantomData. Yes, the title of this article is made up of these words, is kind of clickbait and does not make any sense.

But I like my solution, so I'll share it here. If it helps other people get into Rust: great! If somebody gives me feedback: Even better!

The problem

My approach is inspired by Luigi: This article is about building the dependency tree, which is a DAG. The nodes in the tree have two roles:

The general one: Every node is an asset. Each asset needs to provide some shared functionality, like "are you up-to-date?", "what's your metadata to be displayed in a UI", "bring your state up-to-date!", ... This functionality will be used by the orchestrator of the framework.
The specific one: Each individual node is responsible for a certain subtask, like downloading a web page. The content will be passed to all dependent nodes. The dependent nodes will obviously expect some concrete data from its dependency. They needs to know the concrete type. Just knowing that its just an abstract asset that provides "something" will not work out.

Sounds like a typical OOP problem. Approaching it like this in Rust often results in hitting a wall, because the pattern does not work in Rust. This time I did better! Here we go!

The solution

Reference counting smart pointer

Nodes will hold "references" to possible multiple input nodes and might themselves be the input for multiple nodes. Modeling this with plain Rust references is tricky, if possible at all. The tree will be static during runtime and is not performance critical. I aim for ease of use, so we will go for Rc<T>. But what will T be?

Nodes will have to provide a shared interface, which will obviously be a trait. They will need to return their dependencies, so the Asset trait will look like this (not yet valid Rust code!):

trait Asset {
    fn get_dependencies(&self) -> Vec::<Rc<T>>;
}

This makes it quite obvious, that T cannot be the concrete type of a node. This part of the code is about the "general role" of the node. So the obvious choice for T is dyn Asset. To keep the code clean and readable, lets introduce a wrapper struct:

struct Handle(Rc<dyn Asset>);

Building a tree

Lets make up a simple example: One step downloads the html from a url, a second step extracts data from it. We would get something like this:

struct WebPage {
    ...
};
impl Asset for WebPage { ... }

struct Extractor{
    page: Handle
};
impl Asset for WebPage { ... }

This Extractor can store the "reference" to an input page in the page field. But it could be any type of node, which is already a problem. We want the node to be a WebPage. And even if it were a WebPage we would not know it, because the information would be "lost" as soon as we stored it in page.

PhantomData

The solution seems to be obvious: We need to somehow include the information about the required node type in Handle. My first try looked like this:

struct Handle<T: Asset + 'static>(Rc<dyn Asset>);

impl<T: Asset + 'static> Handle<T> {
    #   functions that will rely on knowing T
}

The compiler will complain about the type T not being used in the struct, which is correct. The solution is PhantomData:

struct Handle<T: Asset + 'static>(Rc<dyn Asset>, PhantomData<T>);

PhantomData does not take any space, so it will be optimized away completely. But it makes the compiler happy, because T is used in the struct. Our Extractor will need to change to

struct Extractor{
    page: Handle<WebPage>
};

Looks better: We remember what's inside Handle and the compiler will prevent us from putting a node of the wrong type into it.

The Any trait

Progress! But how do we access the original WebPage in the following code?

impl WebPage {
    fn web_page_specific_function(&self) -> SomeResult { ... };
}

impl Extractor {
    fn do_something(&self) {
        #   How to call web_page_specific_function?
    }
}

The solution is the Any trait. It allows downcasting dyn Asset to a concrete type. To enable that functionality we need to make Any a super-trait of Asset:

trait Asset: Any {
    ...
}

Having that in place we can try to access the methods from the concrete type. Here is the first straight forward approach:

impl<T: Asset + 'static> Handle<T> {
    fn get_original_ref(&self) -> &T {
        let any: &dyn Any = self.0.as_ref();
        any.downcast_ref().unwrap()
    }
}

The first line gets a reference to the value inside the Rc and makes it an Any. That works, because we made Any a supertrait of Asset. Having an Any we can use downcast_ref() to convert it back to a reference to the original type T. downcast_ref will of course only work, if you try to downcast to the correct type. It cannot do magic, so it will return an Option<&T> and None if the downcast is not possible. In our case we will always use the correct T, the compiler just does not know about that. Therefor I consider it safe to use unwrap() here. Problem solved! But ...

The Deref trait

... calls to get_original_ref will be sprinkled all over the place. Most functions of Extractor and all other Asset implementations will have to use it. And it is just noise, an implementation detail that should be invisible. I dislike having to read such noise in my code!

Taking one step back it becomes quite obvious, that Handle is a specialized smart pointer. It holds a "thing" and takes care of its management, but I actually just want to use it like the "thing" itself. That's the exact use case for the Deref trait. Instead of implementing and using get_original_ref , we can implement Deref for Handle:

impl<T: Asset + 'static> Deref for Handle<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        let any: &dyn Any = self.0.as_ref();
        any.downcast_ref().unwrap()
    }
}

Finally, the code in functions of Extractor will look like this:

impl WebPage {
    fn web_page_specific_function(&self) -> SomeResult { ... };
}

struct Extractor{
    page: Handle<WebPage>
};

impl Extractor {
    fn do_something(&self) {
        let result = self.page.web_page_specific_function();
    }
}

Final steps

The remaining details are simple. Handle needs a method to get a clone of the inner Rc.

impl<T: Asset + 'static> Handle<T> {
    fn as_rc(&self) -> Rc<dyn MyTrait> {
        self.0.clone()
    }
}

With that in place, Extractor can return its dependency:

impl Asset for Extractor {
    fn get_dependencies(&self) -> Vec<Rc<T>> {
        vec![self.page.as_rc()]
    }
}

If you have questions, comments or improvements: Please let me know. Your feedback would be appreciated.

Impressum