Interspeech 2022: An Emotional Experience!

Interspeech is the pre-eminent annual speech technology conference, covering areas from speech synthesis to speech recognition, dialogue modeling, emotion in speech, and many other areas, all of which contribute to the proliferation of human speech in applications. This year, the 23rd Interspeech Conference was held Sept 18-22 in Incheon, Korea. 

Accompanied by Speech Graphics’ CTO Michael Berger and Lead Data Engineer Georgia Clarke, our Director of Machine Learning, Dimitri Palaz, gave a talk on our research paper entitled ‘Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning’.

L-R: Dimitri Palaz, Georgia Clarke and Michael Berger get emotional at Interspeech 2022.

Leveraging the team’s pioneering expertise in audio-driven facial animation, the Speech Graphics paper sets the scene as follows:

“There is more to facial animation than just the quality of the lip-syncing. Believable head motions, small eye movements, and emotive expressions are also essential for a realistic viewing experience. In order to generate believable emotive expressions for characters, it is important to know exactly when and which emotions should be in the animations at any given time. This can be achieved by expensive and slow hand-annotation, or by using information in the speech signal the character will be presenting to detect which emotions are present at a given point in the utterance.

Speech Emotion Recognition (SER) is a widely studied and surprisingly difficult task within the speech community. In its broadest sense, SER deals with the problem of identifying and classifying emotional aspects of human speech.”

Sign up to receive the entire paper: ‘Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning’

The conference had a great array of talks, including a thought-provoking keynote by Rupal Patel on the future of AI voice. That field is having a moment, as synthetic voices are approaching becoming indistinguishable from natural voices. 

New neural network architectures and training methods were at the forefront of most progress.
Detection of emotion was a hot topic this year, and curiously researchers are also attempting detection of complex states such as confusion and irony from the voice.


CTO, Michael Berger said: “Without prejudice, I thought Dimitri gave the best talk of the conference in the area of Speech Emotion Recognition, not only for its theoretical and experimental content but because it was one of the few papers addressing how to accomplish this difficult task in the real world, as opposed to evaluating on laboratory data very similar to the training data.”

Dimitri Palaz delivers Speech Graphics’ paper on SER.

“It was great to be able to attend the Interspeech conference in-person this year, “ said Dimitri. “The oral presentations and the posters were very interesting, and it was a pleasure to be able to discuss directly with the authors. Our presentation was also well-received, we had interesting follow-up conversations. Overall, it is always a pleasure to participate and engage with the speech research community and we are looking forward to Interspeech 2023 in Dublin!” 

Georgia echoed these sentiments: “I felt very lucky to be able to attend Interspeech in South Korea this year. After two years of remote conferences the atmosphere was fantastic! I’m immensely proud of the team’s hard work getting our paper accepted, which was really well received, and it was great to see the wide range of research on show. As always, I came away with lots of ideas and renewed vigour, highlighting the importance of our continued participation in this great research community. Looking forward to next year!”

Thanks, Interspeech; it’s been emotional!

Previous
Previous

Day in the Life of a Character Artist: Introducing Gabor!

Next
Next

We’re going to Reboot Dubrovnik!