Sansar Opens to the Public in Creator Beta

SAN FRANCISCO – July 31, 2017 – SansarTM, the world’s leading social VR platform, today opened its creator beta to the public. Sansar empowers individuals, communities, schools, studios, corporations, brands and others to easily create, share, and ultimately sell immersive 3D social experiences for consumers to enjoy on HTC Vive, Oculus Rift, and Windows PCs. Developed by Linden Lab®, the company behind the groundbreaking virtual world Second Life®, Sansar is a brand new platform built from the ground up to enable everyone to become a creator.

At opening, Sansar’s Atlas directory already features hundreds of engaging virtual experiences, including multiplayer games, recreations of historic sites and landmarks, art installations, movie theaters, museums, narrative experiences, jungle temples, 360º video domes, sci-fi themed hangouts, and much more. Creators invited to the platform during a limited-access preview have published thousands of amazing public and private experiences, and with the opening of beta today, the world is now invited to join them.

“Sansar democratizes social VR,” said Ebbe Altberg, CEO of Linden Lab. “Until now, complexity and cost has limited who could create and publish in this medium, and Sansar dramatically changes that. It’s been inspiring to see the thousands of virtual creations that have already published with Sansar during our limited preview, and I’m looking forward to the explosion of creativity we’ll see now that we’ve opened the doors in beta.”

Sansar Capabilities

Simplified Creation & Cross-Device Distribution

Intuitive drag-and-drop editing makes it easy to create a scene with assets imported from common 3D modeling tools or purchased from the Sansar Store.

With the push of a button, creations become hosted multi-user experiences that can be enjoyed by consumers using VR head-mounted displays (HMDs) as well as in desktop mode on PCs. Every Sansar experience has a unique link that can be shared via Facebook, Twitter, email, blogs, and with whomever the creator wishes. Each instance of an experience is currently set to allow 35+ concurrent avatars, and automated instancing will enable creators to reach unlimited audiences.

Convincing Social Interactions

With detailed customizable avatars, Sansar provides rich social interactions, without requiring additional hardware like cameras and trackers. A unique integration with Speech Graphics’ technology provides accurate avatar lip-syncing and facial animations, driven in real time as users simply speak into the microphones on their HMDs or audio headsets. With the use of VR hand controllers, users’ hand and arm movements are accurately and realistically mirrored by their avatars, thanks to an integration of IKinema’s powerful RunTime middleware, the world’s leading full-body inverse kinematics (IK) technology.

Robust Marketplace & Earning Opportunities

With Sansar, creators can earn from their virtual creations by selling them in the Sansar Store. In the future, creators will also be able to sell, rent, or charge for access to their experiences. At the opening of beta, the Sansar Store features thousands of items for sale from creators around the world.

A relationship with TurboSquid provides creators with access to hundreds of additional high-quality 3D models in the Store today, with thousands more being added in the coming months. Planned integration with TurboSquid’s StemCell initiative will make it easy for TurboSquid’s community of 3D modelers to immediately upload and sell their creations in the Sansar Store, further augmenting the assets readily available to Sansar creators.

Develop Awards 2017 Nominee

Game industry website Develop shortlisted Speech Graphics for an Industry Excellence Award. Our category is Technology Provider, and we were nominated for helping British game developer Paper Seven to bring Blackwood Crossing main character to life using our technology. Microsoft Coalition also made use of SGX for their in-game facial animations in Gears of War 4.

The awards show will be held in Brighton in July during the Develop Conference.

A case for utilising Automation and Machine learning for Large Scale Content Production

Internal and external views of Speech Graphics facial animation output

The recent headlines surrounding Mass Effect Andromeda’s quality of facial animation has brought to light some significant challenges in modern video game production. Developers are in a constant arms race for higher fidelity graphics while players and users in general demand deeper and longer content. These trends are reflected in game development budgets, where art production often represents over 60% of the total costs. This is compounded by the fact that iteration is a huge part of the development process. Many art assets and animations are created, recreated, discarded, and then recreated again. While iteration is an important and required part of the creative process, art creation is still extremely labour intensive so any iteration will increase time and costs significantly.


Facial animation and in particular lip synch is a very difficult art asset to create as it requires interdisciplinary knowledge and input. Speech articulation is the most complex muscle movement that humans can perform. In the past and still often today dialogue is animated by hand from a recorded sound track where each sound is carefully posed by an animator. For a high end production, a skilled animator can produce about three to five seconds of facial animation a day.


Facial motion capture is a semi-automated approach where a camera tracks the movements of the face which while much faster than traditional hand animation, can yield great results for gross facial movement, is error-prone for lip-synch because of occlusions of the inner contour of the lips. These occlusions will then have to be manually corrected or the lip-synch looks off when the lips never come together for sounds like ‘m’ or ‘p’. Both techniques can produce very compelling results if enough man hours are spent on animation and clean up but time is often in short supply.


Games such as Mass Effect, Fallout, or Skyrim have over a hundred thousand lines of dialogue which all need to be animated. In addition the story that is told through all that dialogue will most likely change over the course of the project, requiring rewrites and re-recordings of voice actors. Then once the development team is happy with the results in the development language, which is often English, all that recorded content will need to be localised which requires recording of voice actors in ten plus languages. That localised animated dialogue will either have to be dubbed, meaning the voice actor has to try to match the timing of the original animation or the faces will have to be reanimated to make the lip synch match the localised dialogue.


This example alone shows that faces in particular create many challenges in art production and the creative process which makes the case for a fully automated solution extremely strong. To overcome the challenges posed by dialogue animation in modern game production, developers really need a solution that produces accurate and compelling facial animation automatically from speech input in any language.


Recent advances in machine learning such as deep learning, the ever increasing computing power described by Moore’s law, and the advent of GPU-driven computing make it possible to significantly reduce labour intensive processes in art content production. What in the past, was called procedural content production, comes to mind which can create simple content such as Bananas in different shapes and shades of yellow from a single Banana example by varying colour and curvature. Deep learning based approaches on the other hand make for a much more compelling solution. Multi-layered deep neural networks learn levels of abstraction and representation that make sense of data such as art assets.


A deep learning system, once it has seen enough training examples, can in theory produce an unlimited amount of complex content by automatically learning the relationship between thousands of input parameters. Gas stations in shooters are an interesting example, in the sense that almost every game has one, which was probably designed from scratch by an artist. By feeding labeled examples of gas stations into a deep learner, it will automatically determine what the distinguishing features are of these gas stations, such as mesh structure, intersections, height to length to width ratio, etc and lets a user create an infinite amount of new gas stations automatically by giving the system a constrained set of input parameters such as polygon count and footprint.


The key is to have the system learn from the right data. As with any machine learning approach the principle of garbage in, garbage out holds for deep learning as well. Without seeing any good art, the system will not be able to produce good art. Therefore a lot of skill and knowledge goes into preparing the training data to make sure it accurately represents the space that needs to be modelled, be it bananas, gas stations, or facial animation.


The trick is that many development studios already have a lot of good examples of art assets in their back catalogue. In principle this would allow them to use their own previous assets as training data for a deep learning system including vast amounts of recorded dialogue and animation. However, the challenge for facial animation in particular is that it requires learning a mapping from one complex space, namely speech to another abstract space, facial animation. This is further complicated by the fact that in contrast to gas stations, speech and animation are not stationary, but change rapidly and widely over time.


The good news is that deep learning excels at complex mapping tasks and can create extremely powerful models given enough training data. There is a vast body of research from speech recognition and speech synthesis that gives a good foundation on how to model speech which combined with expert knowledge in computer graphics enables the training of systems that automatically produce facial animation from speech input alone. Furthermore the same principle of constraining machine learning with expert knowledge can be applied to any type of asset giving developers the ability to create vast amounts of content quicker and cheaper while saving resources in the process.
Game Studios can now implement machine learning and deep learning in their art production pipelines to meet the challenges posed by continually increasing graphics requirements, as well as the consumers’ demand for immersive quality content. Especially for such time-intensive production as facial animation, which plays a significant part in player engagement through storytelling. Third party vendors such as Speech Graphics are bringing their expertise in deep learning, not generally inherent in the game industry, to elevate those thousands of lines of dialogue animation to a standard that modern immersive game productions demand and the high quality players expect.

Speech Graphics at GDC 2017

Speech Graphics will be at Game Developer’s Conference (GDC) in San Francisco February 27 to March 3.

We are presenting an exclusive preview of our production software SGX 2.0 which will include the following features:

  • Automatic generation of facial expression and lip sync from audio
  • Improved silence/speech detection
  • All-in-one executable with fewer input arguments and more flexible organization of input
  • Ability to create your own “expression library” through our Maya plugin.
  • Improved quality of shouted speech animation
  • New prosody analysis that can tune itself from a single utterance
  • Transcription-less animation

Get in touch here to book a meeting at our suite in the Marriott Marquis.