Developer shows off new system that can make you say sentences you never said.


“I’m magic,” he said with a smile.

Developer Zeyu Jin was demonstrating how a new voice editing system called VoCo can create new words and sentences from your voice at the Adobe MAX creativity conference last week in Las Vegas.

“You a demon!” yelled one of the conference hosts, comedian Jordan Peele.

Jin had just taken a recorded phrase from Peele’s sketch comedy partner, Keegan-Michael Key, and changed it into something new.

With a few keystrokes, Key’s joke, “I jumped on the bed and kissed my dogs. And my wife. In that order,” changed to, “I kissed Jordan three times.”

Peele reacted in mock horror, yelling, “You a witch!”

“We can actually type something that’s not there,” Jin said.

“You could get into big trouble for something like this,” Peele said. “If this technology gets into the wrong hands…”

That is a real concern for some who work in the area of security, privacy and ethics—could someone take this “magic” voice trick and re-work your words into real-sounding sentences or conversation?

“It seems that Adobe’s programmers were swept along with the excitement of creating something as innovative as a voice manipulator, and ignored the ethical dilemmas brought up by its potential misuse,” Dr. Eddy Borges Rey with the University of Stirling in Scotland told the BBC.



Comedy team Keegan-Michael Key (left) and Jordan Peele (right). Photo credit: Keved via / CC BY


Real or fake?

Jin told the crowd that VoCo is meant to help editors who need to fix a voice error in an editing project.

“Like audio books, podcasts,” he said. “Really not for bad stuff.”

And Adobe’s web site echoes the same goals for VoCo.

“You send your client a completed video project and they ask you to make a last minute change to the voiceover…but the voiceover artist is already on a plane to Hawaii,” the site said.

But security experts also see the opportunity for misuse.

“There is the potential that voice editing systems could be used in attempts at masquerading as other individuals,” said Matt Lewis with cybersecurity company NCC Group.

“The immediate concern might be in targeted social engineering attacks, perhaps where attackers pretend to be friends or loved ones of victims in attempts at eliciting secret information from them or coercing them into performing certain actions,” he told Archer News.

Stealing your voice could be convenient for crooks carrying out the Grandparents Scam, or trying to steal money by staging fake kidnappings, where they call and pose as a loved one being held hostage. 

Your voice, your money

Now that banks and other companies use your voice as a way to prove who you are, “putting words into someone’s mouth” could be a new method of cyber theft.

“The other potential abuse case is fooling a voice biometric system such as a telephone banking application,” said Lewis.

Some major banks use voice authentication or are planning to do so, like Capital One, Barclay’s, Citigroup, ING and more.

But Lewis said the success of masquerade attacks on voice biometric systems would depend on a number of factors.

For example, your real voiceprint—as seen on a computer—might look different from the fake voiceprint.

“While to the human ear, output from systems such as VoCo might be almost identical to the real spoken word of an individual, at digital sample level, the outputs could look radically different,” he explained. “And so voice biometric systems might not be so easily fooled owing to the variation in voice characteristics produced by editing systems, including tone, pitch, cadence, rhythm, volume, etc.”



Digital samples might show the difference between a real voice sentence and an edited voice sentence.



20 minutes

The developer said VoCo needs about 20 minutes of your speech to be able to fake your words and sentences.

That could be a challenge for criminals.

“Systems like VoCo need big sample sets in order to be as accurate as possible,” Lewis said. “Gaining such a large sample from a victim without their knowledge wouldn’t be impossible, but would be difficult.”

Another obstacle—a secret passphrase, like a verbal PIN.

“Some voice recognition systems allow users to choose their own secret passphrase which they repeat when accessing the system,” said Lewis.

“If an attacker doesn’t know the secret passphrase, then being able to reconstruct a victim’s voice might not be that useful without the contextual wording,” he added.

Keeping the bad guys away

A spokesperson for Adobe said VoCo is not available to the public right now.

“Project VoCo was shown at Adobe MAX as a first look of forward-looking technologies from Adobe’s research labs and may or may not be released as a product or product feature,” Kiersten Olsen told Archer News. “No ship date has been announced.”

Jin reassured the host and the audience at Adobe MAX that developers are investigating crime-stopping features as well.

“Don’t worry, don’t worry,” he said. “We actually have researched how to prevent forgery. Think about watermarking and detection.”

As researchers try to find ways to make the fake sentences sound real, they also look for ways to brand them as fakes, according to Jin.

“We’re working harder on trying to make it detectable,” he said.



VoCo researcher Zeyu Jin (far left). Photo credit: vernieman via / CC BY


How secure?

Your voice will be used like a password for more than just banking.

Revenue coming in from speech and voice biometrics will increase more than twenty times, from about $250 million last year to more than $5 billion in 2024, according to market research company Tractica.

But is it safe?

“Voice authentication technology is constantly improving and modern systems employ quite effective techniques that can identity replay attacks or mimicry,” Lewis said.

One such technique—“adaptive enrollment,” where the authentication program takes a new sample every time you call in.

So, if someone is using a voice replay system to fake your voice passphrase, the program should notice that the replay is exactly the same as a previous voice sample.

“A system such as VoCo won’t change how the output is spoken—it will be the same each time—and so modern voice recognition systems are able to detect samples that are suspiciously similar to previous authentication attempts which is usually indicative of some sort of replay attack,” Lewis said.

What you need to know

VoCo may never hit the market. But there is still risk, according to Lewis.

Biometrics—like your voice—are not something you can hide, so someone could always try to record you and use it against you.

‘Just be aware that biometrics are not secret and so the more online presence you might have with spoken word, the more vulnerable you might be to potential masquerade attacks,” he said.

He recommended that you choose a “unique and obscure” spoken phrase for your passphrase when you enroll in a voice recognition system, so people can’t guess it easily.

Voice recognition can still help you, even if it is not a reliable authenticator, according to Lewis.

“In terms of security, voice recognition does offer great benefits in fraud detection,” he said. “It can be used as an early indicator of fraudulent attempts, which renders it useful in financial applications, for example, where detecting and blocking fraud is paramount.”



Capital One advertises on its site that you can check your balance & pay bills just by speaking through the Amazon Alexa voice system.


New era, less trust

The concept of voice editing is not new. What VoCo offers, however, is a way to for you to easily type in new phrases that someone never said, and create a realistic voice recording in seconds.

“We have already revolutionized photo editing,” the developer said. “Now it’s time for us to do the audio stuff.”

“I’m blown away. I can’t believe that’s possible,” Peele said.

It’s like Photoshop for voice—just as someone could paste your head onto a nude person’s body and distribute the picture across the Internet, quick and easy voice editing could allow someone to make you say things you never dreamed of.

“Let’s do something to human speech,” Jin said jokingly, “Like changing what you have said in your wedding.”

Not magic, but a milestone in technology. In the future, people will have an easy way to put words in your mouth—and you won’t be able to trust your ears.