MetaHuman Animator + AI

Metahuman Animator + AI

During a weekend's research project, I exercised a familiar workflow with a tool that's been refined within UE5 called the MetaHuman Animator.

MetaHuman Animator uses the Iphone's depth sensors to generate a model called a MetaHuman Identity - the model is generated referencing a pre-recorded video of my own face. Essentially for me, MetaHumanID is mainly used to process performances, increasing the data's fidelity (If the face model in the performance is a 1-to-1 copy of the model it's being processed onto - I will get more accurate results). Through retargeting, this performance can then be mapped to other models - however, depending on how much the head model differs from your own features, the performance can degrade. In most cases, I found that the performance retargets to other models poorly.

This could be due to many reasons - the capture quality could be poor, the tracking itself could be inaccurate, many other things. But in this case, I suspect that my facial features just does not match this metahuman character's facial features. Unless it's trained to understand what expression is what - it will never retarget gracefully.

A few years ago, I fired up Spider-Man Miles Morales and I remember being extremely confused as to why Peter Parker was an entirely different person. Capture data - at least today - is still pretty sensitive. There isn't a magic solution to retarget performance without "training", without losing fidelity. Peoples facial features are unique - and these tools use them as landmarks to solve their trajectories, and animators solve that information for expressions.
This happens all the time and it makes sense. Models are often modified to accommodate an actor's facial features so that the "computer" has an easier time processing the data.

Making believable animations require artistic choices that are purposeful - otherwise we get uncanniness that make us say things like "Something's off but I'm not sure why". This is why you need animators who've studied, who've worked in the industry long, and have built an eye to catch tiny nuances that an AI prompter will never see. But I do see the angle in going down the path of cost-cutting solutions because this feeling of uncanniness can bypass a lot of people.

I believe that AI will be integrated into our pipeline as animators - we will need to eventually adopt a new workflow. But I'm hoping to use this as an example to myself to tread towards a better future where AI can be used ethically - as a tool to enhance what animators already know how to do: make captivating performances.

I've been researching AI software that can be ethically integrated into my own workflow. Many systems rely on optically tracked data - which use identifying landmarks and machine learning to craft performances based on 3D generated skeletons within the 2D video itself - this technology isn't all that new. But some are unique - like Motorica, which is generative in nature and uses prompts to craft performances. I reached out to them and asked how this data is sourced - they have artists that create these base animations - The AI aspect of this tool focuses on the ability to seamlessly integrate different types of performances into a cohesive whole (prompt example: "Walk Stop Holding Gun"). This can be, and is commonly done, traditionally through a "layered" approach in animation - often times a whole team dedicated to refurbishing existing libraries of capture data. In every scenario and every tool I've tried, I see them as a starting basepoint rather than a replacement for framing a key.

Markerless systems like the MetaHuman Animator fall within these types of AI-powered tools, where video footage is used to track facial landmarks and solves for expressions using machine learning. Afterward, I refined the performance using my skills as an animator. Personally I think that last part is what makes or breaks a piece of animation.