App Review: How I use HeyGen to create my virtual clone that speaks any language
One of the things that fascinates me more about AI is to be able to create personalised content at scale. When the first applications capable of generating lip synchronisation, voice cloning and video generation started to appear, I was eager to test them. After one year of researching and trying new tools, I have decided to make a review about my favorite one: HeyGen. Before I start, I would like to make a disclaimer: I didn’t receive any money and neither know anyone who works for this company. I just liked the product. There are many other tools that can generate virtual avatars with good quality, but for my personal reasons, the one that worked better for me was HeyGen.
The idea of a virtual clone is very simple: it’s an application that can create video content just like you speaking directly to the camera whatever text you asked it to say. It is very useful to scale the creation of video content, for example, a YouTube channel, a company instructional video or onboarding for new employees and also create celebration videos, like I did saying happy new year in Russian for a Russian friend even though I don’t speak Russian:
In this video I used my AI virtual avatar to say happy new year in Russian for a Russian friend. It worked very well.
As you can see, the lip synchronisation was very good. It is one of the parts I have seen other applications to have more problem with. It was also capable of using my voice tone very well, it’s exactly how I would sound when trying to speak Russian. This is one of the hardest parts in voice cloning because it takes a small sample of your voice ( in my case speaking in english ) and is capable of sounding like me speaking other languages. I have seen so many applications that don’t work at all in other languages or make it very poor, even on the original language.
How to make your virtual clone
The app uses an AI that learns from a sample video you have to make in order to learn the pattern of speaking of your voice and what you look like speaking. Your sample video must follow strict guidelines in order to allow the AI to be accurate. They are the following:
- It has to be 2 -5 minutes length
- You need to use a high-resolution camera
- The recording must be made on an environment with good lightning and without background noise
- You need to look directly to the camera
- You need to pause between each sentence and keep your mouth closed during these pauses
- You need to make generic gestures ( natural ones ) but keep your hands below the chest
- You need to be on the center of the video with full visibility of your waist and above
Just to give you an example, here is the sample video I recorded to train the AI for my clone:
I’m not gonna lie, it’s weird to look at the camera for almost 3 minutes and try to think of stuff for talking and keep in mind all those rules for the sample video, but in the end it worked fine.
After uploading the sample video, you have to pass a security verification to prove that you are the person on the sample video and is not trying to create a clone for someone else. It’s a simple step where you make another video of yourself saying some sentences generated by the app for security reasons. After some minutes you are ready to go.
The app interface is quite simple. It has many features but I decided to focus on a simple generation of a video of my avatar speaking in other languages. Remember that my video was uploaded in english. I was eager to see if HeyGen would be able to reproduce my accent in other languages, which is one of the hardest features in this kind of application. SPOILER: it worked!
Here is an example of me speaking in French:
Not only the lip synchronisation was really good but also my voice tone remained the same. I really liked the result. I decided to give a try in a language that is not so often found on those applications. I have a friend who speaks Hungarian and could assess how good was the speaking in this language so I decided to give it a try:
Again the result was very good. The lip synch, my voice tone, the speaking. It was incredible to be able to get this precision with just a two and a half minutes video for training.
Conclusion:
Even though there are still some limitations on this app, I believe what was accomplished with so little effort shows an incredible evolution on AI for content generation and scalability. As I stated before, there are many applications that are capable of providing lip sync, voice cloning and ai avatars. Each platform has its own ways of providing this services. The idea of this post is not just to highlight this specific platform but to give you an idea of what kind possibilities we have from AI-powered applications today. Hope you enjoyed my post.
Leave a Reply