I just reviewed the released trailer from Quantic Dream’s upcoming psychological thriller Beyond: Two Souls. Quantic dream is pushing the boundaries in facial modeling and likeness. Willem Dafoe is brilliant and they did a fantastic job using in-house shape capture technology to acquire the shapes and basic deformations of his face. But what about the motion?
This interview with Dafoe shows the facial motion capture setup: standard marker tracking. Unfortunately, it suffers from the usual drawbacks of using optical motion capture for speech:
1. The lower face appears wooden or rigid, due to various limitations: marker sparseness, signal filtering to smooth out noise, and the inherent problem of retargeting marker motion to the facial model’s control parameters. The resulting mouth movements are either too slow or too low in magnitude, and the speech is under-articulated. It’s as if the character LOOKS like he’s mumbling even though he doesn’t sound that way. This creates a visual-auditory mismatch.
2. Critical speech events are missed because the inner lips and teeth can’t be tracked optically. For example, to produce an F or V, the lower lip must come up and press against the upper teeth. This never happens with motion capture because that kind of event can’t be tracked with markers placed around the periphery of the mouth.
Not to boast, but procedural, audio-based animation remains the best solution for the lower face, especially when you are aiming for this level of realism. Gaps in dynamic realism become more noticeable when the static realism is so high.