Photoshop trolls can manipulate photos, but in the future we may have a new type of troll \u2026 trolls which can easily manipulate spoken words just by typing text into an audio editing program.Last week at the Adobe Max Creativity Conference, Adobe developer Zeyu Jin mentioned that people \u201chaving been making weird stuff online\u201d with photo editing software, before adding, now \u201clet\u2019s do something to human speech.\u201dJin gave a sneak peek of software which is like Photoshop for audio, demonstrating Project VoCo, for voice conversion, by altering a voice clip of comedian Keegan-Michael Key. The voice clip was something Key said after being nominated for an award.Key had said, \u201cI jumped on the bed, and I, uh, kissed my dogs and my wife \u2013 in that order.\u201d Jin honed in on the \u201ckissed my dogs and my wife\u201d portion.Jin showed that changing what Key said is as simple as typing in new text. He changed the order of Key\u2019s kissing statement to, \u201ckissed my wife and my wife\u201d and then to \u201ckissed my wife and my dogs.\u201dTo show the voice conversion would work using words which had not been spoken, Jin decided to use an example of inserting a reference to Max Creativity Conference host Jordan Peele; Peele is also Key\u2019s comedic partner. Jin edited the audio so Key said \u201ckissed Jordan and my dogs\u201d and \u201ckissed Jordan three times.\u201d You can hear that around 3:55 in the video below. Adobe issued a statement explaining the purpose of VoCo:When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative. We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words.TechCrunch explained how VoCo works:Project VoCo needs about 20 minutes of voice samples from a given speaker. It then analyzes the speech, breaks it down into phonemes, transcribes it and creates the voice model. If you listen closely, you can hear when a word has changed, but it\u2019s probably only a matter of time before you won\u2019t be able to distinguish the actual recording and the edited (or completely fake) one.It\u2019s also likely the amount of voice data needed will decrease. At this stage, Adobe considers VoCo to be \u201cexperimental technology.\u201dToward the end of the presentation, Jin noted that Adobe has already considered how VoCo could be abused. The company has \u201cresearched how to prevent forgery. Think about watermarking detection. As we\u2019re getting the results much better, making it so people can\u2019t distinguish between the fake and the real one, we\u2019re working harder trying to make it detectable.\u201dAdobe hasn\u2019t said when VoCo will be made available as part of Creative Cloud, although the Adobe blog pointed out that many sneak peeks \u201cfrom previous years have later been incorporated into our products.\u201dIf interested, you can find out more by reading the Princeton University and Adobe Research paper (pdf) released in March at the IEEE International Conference on Acoustics, Speech and Signal Processing.