Emotional Alchemy: Brewing Engagement through Facial Animation in Hogwarts Legacy

 

A childhood dream becomes reality for millions in this adventure into the wizarding world.

Studio: Avalanche Software
Release Date: February 2023
Platforms: PS5, Xbox Series XIS, PS4, Xbox One, Nintendo Switch, PC
Genre: Open World Action RPG


Facial Animation in the World of Hogwarts Legacy

Warner Bros. Games’ Hogwarts Legacy, developed by Avalanche Software, transports players into Hogwarts during the 1800s, creating a fresh way for Harry Potter fans to immerse themselves in the wizarding world. The story follows a student at Hogwarts School of Witchcraft and Wizardry, who holds the key to an ancient secret that threatens to tear the wizarding world apart.

Using Speech Graphics’ SGX’ audio-driven technology to automate muscle-based facial animation, the studio was able to deliver hundreds of thousands of lines of dialogue with believable emotional expression and dialectically accurate lip-sync, fully localized across 8 languages:

  • English

  • French

  • German

  • Italian

  • Japanese

  • Portuguese (Brazilian)

  • Spanish (Castilian)

  • Spanish (Latin American)

 
 

By reducing the need to animate at keyframe level by hand, the Avalanche team was able to focus their efforts on fine-tuning the experience to maximize engagement and immersive Storytelling.

Technological Wizardry

L to R: Georgia Clarke (Head of Data Linguistics), Samuel Lo (Data Linguist)

Using SGX, generating facial animation is simple. Users import audio files with corresponding transcripts, then select the relevant character assets and language model. Upon processing, our system generates the correct pronunciation specific to the target language, including where exactly each individual sound is in the audio signal. The phonetic information used to generate facial muscle movements that are most accurate for the language in question. What happens if a word has more than one meaning and pronunciation (e.g. tear)? The Speech Graphics language modules know all possible pronunciations of a word, and it will choose the pronunciation that matches the audio best!

Speech Graphics’ Data & Linguistics team, led by Georgia Clarke, continually refines our language packs through R&D as well as customer feedback.

Working with language modules is always a fun process. Every time, we do a deep dive into the phonetics of a language, and we end up finding something new and unexpected about the language. We watch many videos of real people pronouncing different sounds and compare it to our animation. Sometimes we can go on for hours talking about how to make a single sound look better!
— Samuel Lo (Data Linguist)

Magic Unmasked

Jose Villeta, Director of Software Engineering at Warner Bros. Games, explains the technical intricacies of the combined procedural animation pipeline and how Speech Graphics software was integrated to deliver the desired localized animation.

Overview of Avalanche Software’s Procedurally Generated Facial Animation Pipeline

Avalanche Software’s internal pipeline for ‘Hogwarts Legacy’ was both complex and ambitious. We seamlessly integrated automated voice-over generation through Amazon Polly, optimized the processing of facial animations with Speech Graphics, and efficiently ingested assets into Unreal Engine, all within a fault-tolerant automated pipeline.  The intricate process led to the creation of over 50,000 lines of dialogue per language in 8 languages, all while maintaining animation fidelity and language inclusivity.

The entire pipeline was automated using TeamCity as our continuous integration platform.

Our Story Writer’s ideas could be written into Articy and within the same hour visualized in game. This was simply magical. 

Optimizations for Processing Massive Data

In order to handle the substantial volume of data required for our video game, we needed to optimize our pipeline. Through our partnership with Speech Graphics, we meticulously refined every facet of our facial animation creation pipeline. This not only resulted in significant resource savings but also advanced the procedural animation pipeline for future projects. To effectively manage millions of lines of dialogue during our iteration processes, we needed to scale up. We accomplished this by threading 40 instances of the Speech Graphics command line tool and running 36 headless Maya instances on a single workstation agent in TeamCity. Speech Graphics fine-tuned the command line program, and we also achieved significant speed improvements within the Speech Graphics plugin for Maya, particularly in the areas of baking and animation exporting.

Team City automation

Concurrent Maya Speech Graphics processing

High Quality Localized Cinematics

With Speech Graphics, we seamlessly localized cinematics by blending English facial captures with localized lip-sync generated by Speech Graphics. By isolating the mouth muscles in game we could mask in localized lip-sync over an English facial capture. This approach ensured animation fidelity across languages and saved valuable time and resources.

Leveraging Speech Graphics Metadata

The runtime usage of metadata mined from Speech Graphics Event files was a game-changer. We harnessed pitch, intensity, prosody, along with word and pause alignments extracted from Event files. This data enabled us to automate body gestures that matched what the character was saying and transition in and out of speech maintaining facial emotions in the game, elevating the character performance quality to unprecedented levels.

We developed internal tools to help mine the meta data and generate config files used by the pipeline.

Python Tool to mine data from SGX Event files and generate metadata configuration files used by the automated SGX pipeline.

View any Events MetaData.

Visualize bucket hits by character along with intensity.

 
 


In-Game Internal Tool for Speech Graphics Transcript Mark-Up

Our collaboration extended beyond animation generation. We developed in-game tools for our artists to mark-up Speech Graphics transcripts, facilitating quick iteration while visualizing results. This level of creative control was invaluable, allowing us to fine-tune facial expressions and emotions with precision. (SGX 4, released after this production, now provides a graphical user interface called SGX Director to do this editing on the timeline.)

Transcript Mark-up tool

Transcript Mark-Up Tool

Transcript Mark-Up Tool

Additional Internal Tools

Reporting

We had dozens of nightly reports generated to help us monitor our facial animation pipeline. Some of these reports are shown below. 

Audio Stats Report

SGX Confidence Report

SGX Manual Marked Up Transcript Collision Report

The last report helps us identify manual marked up transcripts where the origional source transcript has changed, thus an artist needs to re-markup the updated transcript.

Game Editor Tool: Character Creator

This tool has many uses, but one major usage was its ability to play back any character’s Speech Graphics generated dialogue line. It also allowed us to display the mined metadata on screen timed to the lip sync. 

See Emotion, Word Bucket, Pause, Tone, Intensity Metadata firing onscreen logging in real-time.

Choose any character, and any dialogue line to visualize the Speech Graphics generated facial animation in editor.


We are proud of the groundbreaking results achieved in "Hogwarts Legacy," and the success of this project has set a new standard for procedural facial animation in the gaming industry. Speech Graphics’ focus on character design intricacies has delivered high-quality character fidelity and language inclusivity without compromising on animation quality. - Avalanche Software


The SGX suite delivers high quality automated facial animation from audio on first pass, and gives you control to direct the resulting performance in line with your precise vision. Malleable to any pipeline, any characters, at any scale.

Previous
Previous

Pushing The Bounds of Reality in Remnant 2

Next
Next

Dreamful Delights in SpongeBob SquarePants: The Cosmic Shake