ai-voice-generator.md 25 KB

+++ title = "Ethical questions raised over use of AI to recreate Bourdain’s voice in new docu" date = "2021-07-24T09:49:22+08:00" type = "blog" banner = "img/banners/banner-3.jpg" +++

## Ethical questions raised over use of AI to recreate Bourdain’s voice in new docu

img]

AP — The revelation that a documentary filmmaker used voice-cloning software to make the late chef Anthony Bourdain say words he never spoke has drawn criticism amid ethical concerns about use of the powerful technology.

The movie “Roadrunner: A Film About Anthony Bourdain” appeared in cinemas Friday and mostly features real footage of the beloved celebrity chef and globe-trotting television host before he died in 2018. But its director, Morgan Neville, told The New Yorker that a snippet of dialogue was created using artificial intelligence technology.

That’s renewed a debate about the future of voice-cloning technology, not just in the entertainment world but in politics and a fast-growing commercial sector dedicated to transforming text into realistic-sounding human speech.

“Unapproved voice cloning is a slippery slope,” said Andrew Mason, the founder and CEO of voice generator Descript, in a blog post Friday. “As soon as you get into a world where you’re making subjective judgment calls about whether specific cases can be ethical, it won’t be long before anything goes.”

Before this week, most of the public controversy around such technologies focused on the creation of hard-to-detect deepfakes using simulated audio and/or video and their potential to fuel misinformation and political conflict.

Get The Times of Israel's Daily Edition by email and never miss our top stories Newsletter email address Get it By signing up, you agree to the terms

But Mason, who previously founded and led Groupon, said in an interview that Descript has repeatedly rejected requests to bring back a voice, including from “people who have lost someone and are grieving.”

“It’s not even so much that we want to pass judgment,” he said. “We’re just saying you have to have some bright lines in what’s OK and what’s not.”

Advertisement

Angry and uncomfortable reactions to the voice cloning in the Bourdain case reflect expectations and issues of disclosure and consent, said Sam Gregory, program director at Witness, a nonprofit working on using video technology for human rights. Obtaining consent and disclosing the technowizardry at work would have been appropriate, he said. Instead, viewers were stunned — first by the fact of the audio fakery, then by the director’s seeming dismissal of any ethical questions — and expressed their displeasure online.

“It touches also on our fears of death and ideas about the way people could take control of our digital likeness and make us say or do things without any way to stop it,” Gregory said.

Neville hasn’t identified what tool he used to recreate Bourdain’s voice but said he used it for a few sentences that Bourdain wrote but never said aloud.

“With the blessing of his estate and literary agent we used AI technology,” Neville said in a written statement. “It was a modern storytelling technique that I used in a few places where I thought it was important to make Tony’s words come alive.”

Neville also told GQ magazine that he got the approval of Bourdain’s widow and literary executor. The chef’s wife, Ottavia Busia, responded by tweet: “I certainly was NOT the one who said Tony would have been cool with that.”

Although tech giants like Microsoft, Google and Amazon have dominated text-to-speech research, there are now also a number of startups like Descript that offer voice-cloning software. The uses range from talking customer service chatbots to video games and podcasting.

Advertisement

Many of these voice cloning companies prominently feature an ethics policy on their website that explains the terms of use. Of nearly a dozen firms contacted by The Associated Press, many said they didn’t recreate Bourdain’s voice and wouldn’t have if asked. Others didn’t respond.

“We have pretty strong polices around what can be done on our platform,” said Zohaib Ahmed, founder and CEO of Resemble AI, a Toronto company that sells a custom AI voice generator service. “When you’re creating a voice clone, it requires consent from whoever’s voice it is.”

Ahmed said the rare occasions where he’s allowed some posthumous voice cloning were for academic research, including a project working with the voice of Winston Churchill, who died in 1965.

Ahmed said a more common commercial use is to edit a TV ad recorded by real voice actors and then customize it to a region by adding a local reference. It’s also used to dub anime movies and other videos, by taking a voice in one language and making it speak a different language, he said.

He compared it to past innovations in the entertainment industry, from stunt actors to greenscreen technology.

Just seconds or minutes of recorded human speech can help teach an AI system to generate its own synthetic speech, though getting it to capture the clarity and rhythm of Anthony Bourdain’s voice probably took a lot more training, said Rupal Patel, a professor at Northeastern University who runs another voice-generating company, VocaliD, that focuses on customer service chatbots.

“If you wanted it to speak really like him, you’d need a lot, maybe 90 minutes of good, clean data,” she said. “You’re building an algorithm that learns to speak like Bourdain spoke.”

Neville is an acclaimed documentarian who also directed the Fred Rogers portrait “Won’t You Be My Neighbor?” and the Oscar-winning “20 Feet From Stardom.” He began making his latest movie in 2019, more than a year after Bourdain’s death by suicide in June 2018.

## Anthony Bourdain voice-cloning 'a slippery slope' when it comes to AI

img]

The revelation that a documentary filmmaker used voice-cloning software to make the late chef Anthony Bourdain say words he never spoke has drawn criticism amid ethical concerns about use of the powerful technology.

The movie “Roadrunner: A Film About Anthony Bourdain” appeared in cinemas Friday and mostly features real footage of the beloved celebrity chef and globe-trotting television host before he died in 2018. But its director, Morgan Neville, told The New Yorker that a snippet of dialogue was created using artificial intelligence technology.

That’s renewed a debate about the future of voice-cloning technology, not just in the entertainment world but in politics and a fast-growing commercial sector dedicated to transforming text into realistic-sounding human speech.

“Unapproved voice cloning is a slippery slope,” said Andrew Mason, the founder and CEO of voice generator Descript, in a blog post Friday. “As soon as you get into a world where you’re making subjective judgment calls about whether specific cases can be ethical, it won’t be long before anything goes.”

Before this week, most of the public controversy around such technologies focused on the creation of hard-to-detect deepfakes using simulated audio and/or video and their potential to fuel misinformation and political conflict.

But Mason, who previously founded and led Groupon, said in an interview that Descript has repeatedly rejected requests to bring back a voice, including from “people who have lost someone and are grieving.”

“It’s not even so much that we want to pass judgment,” he said. “We’re just saying you have to have some bright lines in what’s OK and what’s not.”

Angry and uncomfortable reactions to the voice cloning in the Bourdain case reflect expectations and issues of disclosure and consent, said Sam Gregory, program director at Witness, a nonprofit working on using video technology for human rights. Obtaining consent and disclosing the technowizardry at work would have been appropriate, he said. Instead, viewers were stunned – first by the fact of the audio fakery, then by the director’s seeming dismissal of any ethical questions – and expressed their displeasure online.

“It touches also on our fears of death and ideas about the way people could take control of our digital likeness and make us say or do things without any way to stop it,” Gregory said.

Neville hasn’t identified what tool he used to recreate Bourdain’s voice but said he used it for a few sentences that Bourdain wrote but never said aloud.

“With the blessing of his estate and literary agent we used AI technology,” Neville said in a written statement. “It was a modern storytelling technique that I used in a few places where I thought it was important to make Tony’s words come alive.”

Neville also told GQ magazine that he got the approval of Bourdain’s widow and literary executor. The chef’s wife, Ottavia Busia, responded by tweet: “I certainly was NOT the one who said Tony would have been cool with that.”

I certainly was NOT the one who said Tony would have been cool with that. https://t.co/CypDvc1sBP — Ottavia (@OttaviaBourdain) July 16, 2021

Although tech giants like Microsoft, Google and Amazon have dominated text-to-speech research, there are now also a number of startups like Descript that offer voice-cloning software. The uses range from talking customer service chatbots to video games and podcasting.

Many of these voice cloning companies prominently feature an ethics policy on their website that explains the terms of use. Of nearly a dozen firms contacted by The Associated Press, many said they didn’t recreate Bourdain’s voice and wouldn’t have if asked. Others didn’t respond.

“We have pretty strong polices around what can be done on our platform,” said Zohaib Ahmed, founder and CEO of Resemble AI, a Toronto company that sells a custom AI voice generator service. “When you’re creating a voice clone, it requires consent from whoever’s voice it is.”

Ahmed said the rare occasions where he’s allowed some posthumous voice cloning were for academic research, including a project working with the voice of Winston Churchill, who died in 1965.

Ahmed said a more common commercial use is to edit a TV ad recorded by real voice actors and then customize it to a region by adding a local reference. It’s also used to dub anime movies and other videos, by taking a voice in one language and making it speak a different language, he said.

He compared it to past innovations in the entertainment industry, from stunt actors to greenscreen technology.

Just seconds or minutes of recorded human speech can help teach an AI system to generate its own synthetic speech, though getting it to capture the clarity and rhythm of Anthony Bourdain’s voice probably took a lot more training, said Rupal Patel, a professor at Northeastern University who runs another voice-generating company, VocaliD, that focuses on customer service chatbots.

“If you wanted it to speak really like him, you’d need a lot, maybe 90 minutes of good, clean data,” she said. “You’re building an algorithm that learns to speak like Bourdain spoke.”

Neville is an acclaimed documentarian who also directed the Fred Rogers portrait “Won’t You Be My Neighbor?” and the Oscar-winning “20 Feet From Stardom.” He began making his latest movie in 2019, more than a year after Bourdain’s death by suicide in June 2018.

## AI Voice Generation Used for New Anthony Bourdain Documentary, Is This Deepfake?

img]

An AI voice generation method was used for the latest Anthony Bourdain documentary, and it aims to capture something that the late personality said, bordering greatly on deepfake. However, these words were never actually said on recording or any media by the late celebrity chef and are something which the director wants to be part of the film.

Over the past months and during the pandemic, Deepfake has been used to bring fake news and mislead people into believing a lot of different things on the internet. It already caused alarm to the government, and some of which were taken down by social media platforms as they malign people.

AI Voice Generation for Anthony Bourdain

According to an interview by The New Yorker, Director Morgan Neville has revealed that he has used artificial intelligence to generate a copy of the voice of the late Anthony Bourdain, to be included in his film. Here, the voice and audio clip was never said by Bourdain when he was still alive, but something which Neville wants to capture for the documentary film, "Roadrunner."

This has turned a lot of heads and caught massive attention, especially with the lengths that the director went to, to make the film.

And while using modern technology to one's advantage is not a crime, there are certain aspects where one should take caution, especially in making a dead person say things he has never said before.

Read Also: Woman Allegedly Manipulated Daughter's Rivals' Faces with 'Deepfake' AI to Kick Them Off Cheerleading Squad

In GQ's interview with the director, he has revealed that there were as many as four companies that he approached so that they may create the best possible replication of the personality's voice. This was all made for the film's purposes, and nothing else, but still receives a lot of criticism.

Roadrunner: Deepfake Documentary?

The documentary film entitled "Roadrunner" has been met with criticisms and ethical concerns with regards to its use of AI voice generation, as it is not something which the team naturally holds. Meaning that the generated content was brought by the creative minds of the team, instead of being regularly available media content, as seen in the media.

While it uses something like deepfake in the documentary for its content, it is not necessarily a "deep fake documentary," as most of the content, it shows remains authentic and not AI-generated. However, it was not extensively revealed through the interviews, what parts were AI-generated, so having mentioned it might give it a notion that it is mostly deepfake.

Remembering the Late Anthony Bourdain by AI Recreation

Anthony Bourdain has passed away last 2018, and since then, his legacy was always remembered by the media as he was a massive personality in popular culture. The former television host has imparted a lot of knowledge and shared a lot of culture in his documentaries, making him an iconic name in the industry.

Remembering the late Anthony Bourdain will be showcased in "Roadrunner" but it would also integrate generated content to further give people an insight about him is quite the step.

Related Article: Alforithmic Uses AI-Powered Albert Einstein Voice, Concerns About Deepfakes Grow

This article is owned by Tech Times

Written by Isaiah Richard

ⓒ 2021 TECHTIMES.com All rights reserved. Do not reproduce without permission.

## Explained: Why is the use of the Anthony Bourdain AI voice in Roadrunner docu drawing criticism-Entertainment News , Firstpost

img]

'Unapproved voice cloning is a slippery slope”: The use of deepfakes in Anthony Bourdain’s docu Roadrunner has drawn criticism amid ethical concerns about the use of this technology.

The revelation that a documentary filmmaker used voice-cloning software to make the late chef Anthony Bourdain say words he never spoke has drawn criticism amid ethical concerns about the use of the powerful technology.

The movie Roadrunner: A Film About Anthony Bourdain appeared in cinemas Friday and mostly features real footage of the beloved celebrity chef and globe-trotting television host before he died in 2018. But its director, Morgan Neville, told The New Yorker that a snippet of dialogue was created using artificial intelligence technology.

That’s renewed a debate about the future of voice-cloning technology, not just in the entertainment world but in politics and a fast-growing commercial sector dedicated to transforming text into realistic-sounding human speech.

A dicey territory

“Unapproved voice cloning is a slippery slope,” said Andrew Mason, the founder and CEO of voice generator Descript, in a blog post on Friday. “As soon as you get into a world where you’re making subjective judgment calls about whether specific cases can be ethical, it won’t be long before anything goes.”

Before this week, most of the public controversy around such technologies focused on the creation of hard-to-detect deepfakes using simulated audio and/or video and their potential to fuel misinformation and political conflict.

But Mason, who previously founded and led Groupon, said in an interview that Descript has repeatedly rejected requests to bring back a voice, including from “people who have lost someone and are grieving.”

“It’s not even so much that we want to pass judgment,” he said. “We’re just saying you have to have some bright lines in what’s OK and what’s not.”

Angry and uncomfortable reactions to the voice cloning in the Bourdain case reflect expectations and issues of disclosure and consent, said Sam Gregory, program director at Witness, a nonprofit working on using video technology for human rights. Obtaining consent and disclosing the technowizardry at work would have been appropriate, he said. Instead, viewers were stunned — first by the fact of the audio fakery, then by the director’s seeming dismissal of any ethical questions — and expressed their displeasure online.

“It touches also on our fears of death and ideas about the way people could take control of our digital likeness and make us say or do things without any way to stop it,” Gregory said.

Neville claims consent was taken

Neville hasn’t identified what tool he used to recreate Bourdain’s voice but said he used it for a few sentences that Bourdain wrote but never said aloud.

“With the blessing of his estate and literary agent we used AI technology,” Neville said in a written statement. “It was a modern storytelling technique that I used in a few places where I thought it was important to make Tony’s words come alive.”

Neville also told GQ magazine that he got the approval of Bourdain’s widow and literary executor. The chef’s wife, Ottavia Busia, responded by tweet: “I certainly was NOT the one who said Tony would have been cool with that.”

Polarised opinions

Although tech giants like Microsoft, Google and Amazon have dominated text-to-speech research, there are now also a number of startups like Descript that offer voice-cloning software. The uses range from talking customer service chatbots to video games and podcasting.

Many of these voice cloning companies prominently feature an ethics policy on their website that explains the terms of use. Of nearly a dozen firms contacted by The Associated Press, many said they didn’t recreate Bourdain’s voice and wouldn’t have if asked. Others didn’t respond.

“We have pretty strong polices around what can be done on our platform,” said Zohaib Ahmed, founder and CEO of Resemble AI, a Toronto company that sells a custom AI voice generator service. “When you’re creating a voice clone, it requires consent from whoever’s voice it is.”

Ahmed said the rare occasions where he’s allowed some posthumous voice cloning were for academic research, including a project working with the voice of Winston Churchill, who died in 1965.

Ahmed said a more common commercial use is to edit a TV ad recorded by real voice actors and then customize it to a region by adding a local reference. It’s also used to dub anime movies and other videos, by taking a voice in one language and making it speak a different language, he said.

He compared it to past innovations in the entertainment industry, from stunt actors to greenscreen technology.

Just seconds or minutes of recorded human speech can help teach an AI system to generate its own synthetic speech, though getting it to capture the clarity and rhythm of Anthony Bourdain’s voice probably took a lot more training, said Rupal Patel, a professor at Northeastern University who runs another voice-generating company, VocaliD, that focuses on customer service chatbots.

“If you wanted it to speak really like him, you’d need a lot, maybe 90 minutes of good, clean data,” she said. “You’re building an algorithm that learns to speak like Bourdain spoke.”

Neville is an acclaimed documentarian who also directed the Fred Rogers portrait Won’t You Be My Neighbor? and the Oscar-winning 20 Feet From Stardom. He began making his latest movie in 2019, more than a year after Bourdain’s death by suicide in June 2018.

(With inputs from The Associated Press)

## Google’s Translatotron 2 removes ability to deepfake voices

img]

There’s no going back to the 2019 playbook, particularly for benefits. Learn what employees expect in the new normal, and how you can keep a competitive edge.

All the sessions from Transform 2021 are available on-demand now. Watch now.

In 2019, Google released Translatotron, an AI system capable of directly translating a person’s voice into another language. The system could create synthesized translations of voices to keep the sound of the original speaker’s voice intact. But Translatotron could also be used to generate speech in a different voice, making it ripe for potential misuse in, for example, deepfakes.

This week, researchers at Google quietly released a paper detailing Translatotron’s successor, Translatotron 2, which solves the original issue with Translatotron by restricting the system to retain the source speaker’s voice. Moreover, Translatotron 2 outperforms the original Translatotron by “a large margin” in terms of translation quality and naturalness, as well as “drastically” cutting down on undesirable artifacts, like babbling and long pauses.

As the researchers explain in the paper, Translatotron 2 consists of a source speech encoder, a target phoneme decoder, and a synthesizer, connected via an attention module. For every piece of data the encoder and decoder process, the attention module weighs the relevance of every other bit of data and draws from them to generate an output. The encoder creates a numerical representation of speech, while the decoder predicts phoneme sequences corresponding to the translated speech. (Phonemes are the smallest unit of sound that distinguishes one word from another word in a language.) As for the synthesizer, it takes the output from the decoder, as well as the context output from the attention module as its input, synthesizing the translated voice.

Here’s a sample in Spanish:

And here’s Translatotron 2’s English translation:

To prevent the system from generating speech in a different speaker’s voice, the researchers developed a method for voice retraining that doesn’t rely on explicit IDs to identify the speakers — in contrast to the voice retraining method used with the original Translatotron. This makes Translatotron 2 more appropriate for production environments by mitigating potential abuse for creating deepfakes or spoofed voices, according to the research team.

“The performance of voice conversion has progressed rapidly in the recent years and is reaching a quality that is hard for automatic speaker verification systems to detect,” the researchers wrote in the paper. “Such progress poses concerns on related techniques being misused for creating spoofing artifacts, so we designed Translatotron 2 with the motivation of avoiding such potential misuse.”

Deepfake threat

The paper on Translatotron 2 comes as research shows businesses might be unprepared to combat deepfakes, or AI-generated media that takes a person in an existing recording and replaces them with someone else’s likeness. According to startup Deeptrace, the number of deepfakes on the web increased 330% from October 2019 to June 2020, reaching over 50,000 at their peak. And in a survey released earlier this year by Attestiv, fewer than 30% of organizations say they’ve taken steps to combat fallout from a deepfake attack.

The trend is troubling not only because these fakes might be used to sway opinion during an election or implicate a person in a crime, but because they’ve already been abused to generate pornographic material of actors and defraud a major energy producer. Earlier this year, the FBI warned that deepfakes are a critical emerging threat targeting businesses.

The fight against deepfakes is likely to remain challenging as media generation techniques continue to improve. With Translatotron 2, Google researchers hope to head off sophisticated efforts that might emerge in the future.