deepfake-voice.index.md 25 KB

+++ title = "The Ethics of a Deepfake Anthony Bourdain Voice" date = "2021-07-22T19:42:58+08:00" type = "blog" banner = "img/banners/banner-3.jpg" +++

## The Ethics of a Deepfake Anthony Bourdain Voice

img]

The documentary “Roadrunner: A Film About Anthony Bourdain,” which opened in theatres on Friday, is an angry, elegant, often overwhelmingly emotional chronicle of the late television star’s life and his impact on the people close to him. Directed by Morgan Neville, the film portrays Bourdain as intense, self-loathing, relentlessly driven, preternaturally charismatic, and—in his life and in his death, by suicide, in 2018—a man who both focussed and disturbed the lives of those around him. To craft the film’s narrative, Neville drew on tens of thousands of hours of video footage and audio archives—and, for three particular lines heard in the film, Neville commissioned a software company to make an A.I.-generated version of Bourdain’s voice. News of the synthetic audio, which Neville discussed this past week in interviews with me and with Brett Martin, at GQ, provoked a striking degree of anger and unease among Bourdain’s fans. “Well, this is ghoulish”; “This is awful”; “WTF?!” people said on Twitter, where the fake Bourdain voice became a trending topic. The critic Sean Burns, who had reviewed the documentary negatively, tweeted, “I feel like this tells you all you need to know about the ethics of the people behind this project.”

When I first spoke with Neville, I was surprised to learn about his use of synthetic audio and equally taken aback that he’d chosen not to disclose its presence in his film. He admitted to using the technology for a specific voice-over that I’d asked about—in which Bourdain improbably reads aloud a despairing e-mail that he sent to a friend, the artist David Choe—but did not reveal the documentary’s other two instances of technological wizardry. Creating a synthetic Bourdain voice-over seemed to me far less crass than, say, a C.G.I. Fred Astaire put to work selling vacuum cleaners in a Dirt Devil commercial, or a holographic Tupac Shakur performing alongside Snoop Dogg at Coachella, and far more trivial than the intentional blending of fiction and nonfiction in, for instance, Errol Morris’s “Thin Blue Line.” Neville used the A.I.-generated audio only to narrate text that Bourdain himself had written. Bourdain composed the words; he just—to the best of our knowledge—never uttered them aloud. Some of Neville’s critics contend that Bourdain should have the right to control the way his written words are delivered. But doesn’t a person relinquish that control anytime his writing goes out into the world? The act of reading—whether an e-mail or a novel, in our heads or out loud—always involves some degree of interpretation. I was more troubled by the fact that Neville said he hadn’t interviewed Bourdain’s former girlfriend Asia Argento, who is portrayed in the film as the agent of his unravelling.

Besides, documentary film, like nonfiction writing, is a broad and loose category, encompassing everything from unedited, unmanipulated vérité to highly constructed and reconstructed narratives. Winsor McCay’s short “The Sinking of the Lusitania,” a propaganda film, from 1918, that’s considered an early example of the animated-documentary form, was made entirely from reënacted and re-created footage. Ari Folman’s Oscar-nominated “Waltz with Bashir,” from 2008, is a cinematic memoir of war told through animation, with an unreliable narrator, and with the inclusion of characters who are entirely fictional. Vérité is “merely a superficial truth, the truth of accountants,” Werner Herzog wrote in his famous manifesto “Minnesota Declaration.” “There are deeper strata of truth in cinema, and there is such a thing as poetic, ecstatic truth. It is mysterious and elusive, and can be reached only through fabrication and imagination and stylization.” At the same time, “deepfakes” and other computer-generated synthetic media have certain troubling connotations—political machinations, fake news, lies wearing the HD-rendered face of truth—and it is natural for viewers, and filmmakers, to question the boundaries of its responsible use. Neville’s offhand comment, in his interview with me, that “we can have a documentary-ethics panel about it later,” did not help assure people that he took these matters seriously.

On Friday, to help me unknot the tangle of ethical and emotional questions raised by the three bits of “Roadrunner” audio (totalling a mere forty-five seconds), I spoke to two people who would be well-qualified for Neville’s hypothetical ethics panel. The first, Sam Gregory, is a former filmmaker and the program director of Witness, a human-rights nonprofit that focusses on ethical applications of video and technology. “In some senses, this is quite a minor use of a synthetic-media technology,” he told me. “It’s a few lines in a genre where you do sometimes construct things, where there aren’t fixed norms about what’s acceptable.” But, he explained, Neville’s re-creation, and the way he used it, raise fundamental questions about how we define ethical use of synthetic media.

The first has to do with consent, and what Gregory described as our “queasiness” around manipulating the image or voice of a deceased person. In Neville’s interview with GQ, he said that he had pursued the A.I. idea with the support of Bourdain’s inner circle—“I checked, you know, with his widow and his literary executor, just to make sure people were cool with that,” he said. But early on Friday morning, as the news of his use of A.I. ricocheted, his ex-wife Ottavia Busia tweeted, “I certainly was NOT the one who said Tony would have been cool with that.” On Saturday afternoon, Neville wrote to me that the A.I. idea “was part of my initial pitch of having Tony narrate the film posthumously á la Sunset Boulevard—one of Tony’s favorite films and one he had even reenacted himself on Cook’s Tour,” adding, “I didn’t mean to imply that Ottavia thought Tony would’ve liked it. All I know is that nobody ever expressed any reservations to me.” (Busia told me, in an e-mail, that she recalled the idea of A.I. coming up in an initial conversation with Neville and others, but that she didn’t realize that it had actually been used until the social-media flurry began. “I do believe Morgan thought he had everyone’s blessing to go ahead,” she wrote. “I took the decision to remove myself from the process early on because it was just too painful for me.”)

A second core principle is disclosure—how the use of synthetic media is or is not made clear to an audience. Gregory brought up the example of “Welcome to Chechnya,” the film, from 2020, about underground Chechen activists who work to free survivors of the country’s violent anti-gay purges. The film’s director, David France, relied on deepfake technology to protect the identities of the film’s subjects by swapping their faces for others, but he left a slight shimmer around the heads of the activists to alert his viewers to the manipulation —what Gregory described as an example of “creative signalling.” “It’s not like you need to literally label something—it’s not like you need to write something across the bottom of the screen every time you use a synthetic tool—but it’s responsible to just remind the audience that this is a representation,” he said. “If you look at a Ken Burns documentary, it doesn’t say ‘reconstruction’ at the bottom of every photo he’s animated. But there’s norms and context—trying to think, within the nature of the genre, how we might show manipulation in a way that’s responsible to the audience and doesn’t deceive them.”

Gregory suggested that much of the discomfort people are feeling about “Roadrunner” might stem from the novelty of the technology. “I’m not sure that it’s even all that much about what the director did in this film—it’s because it’s triggering us to think how this will play out, in terms of our norms of what’s acceptable, our expectations of media,” he said. “It may well be that in a couple of years we are comfortable with this, in the same way we’re comfortable with a narrator reading a poem, or a letter from the Civil War.”

“There are really awesome creative uses for these tools,” my second interviewee, Karen Hao, an editor at the MIT Technology Review who focusses on artificial intelligence, told me. “But we have to be really cautious of how we use them early on.” She brought up two recent deployments of deepfake technology that she considers successful. The first, a 2020 collaboration between artists and A.I. companies, is an audio-video synthetic representation of Richard Nixon reading his infamous “In Event of Moon Disaster” speech, which he would have delivered had the Apollo 11 mission failed and Neil Armstrong and Buzz Aldrin perished. (“The first time I watched it, I got chills,” Hao said.) The second, an episode of “The Simpsons,” from March, in which the character Mrs. Krabappel, voiced by the late actress Marcia Wallace, was resurrected by splicing together phonemes from earlier recordings, passed her ethical litmus test because, in a fictional show like “The Simpsons,” “you know that the person’s voice is not representing them, so there’s less attachment to the fact that the voice might be fake,” Hao said. But, in the context of a documentary, “you’re not expecting to suddenly be viewing fake footage, or hearing fake audio.”

A particularly unsettling aspect of the Bourdain voice clone, Hao speculated, may be its hybridization of reality and unreality: “It’s not clearly faked, nor is it clearly real, and the fact that it was his actual words just muddles that even more.” In the world of broadcast media, deepfake and synthetic technologies are logical successors to ubiquitous—and more discernible—analog and digital manipulation techniques. Already, face renders and voice clones are an up-and-coming technology in scripted media, especially in high-budget productions, where they promise to provide an alternative to laborious and expensive practical effects. But the potential of these technologies is undermined “if we introduce the public to them in jarring ways,” Hao said, adding, “It could prime the public to have a more negative perception of this technology than perhaps is deserved.” The fact that the synthetic Bourdain voice was undetected until Neville pointed it out is part of what makes it so unnerving. “I’m sure people are asking themselves, How many other things have I heard where I thought this is definitely real, because this is something X person would say, and it was actually fabricated?” Hao said. Still, she added, “I would urge people to give the guy”—Neville—“some slack. This is such fresh territory. . . . It’s completely new ground. I would personally be inclined to forgive him for crossing a boundary that didn’t previously exist.”

## The Anthony Bourdain audio deepfake is forcing a debate about AI in journalism

img]

By now, using machine learning to simulate a dead person on screen is an accepted Hollywood technique. Synthetic media, known widely as “deepfake” (a portmanteau of “deep learning” and “fake”) technology has used been famously on Carrie Fisher in a Star Wars movie and Heath Ledger in “A Knight’s Tale.” In 2019, footage of comedian Jimmy Fallon eerily transformed into Donald Trump demonstrated how advanced the technology has become.

But no one was laughing when it was revealed that deepfake technology was used to simulate Anthony Bourdain’s voice in the new documentary Roadrunner: A Film About Anthony Bourdain. In an interview with GQ, film director Morgan Neville revealed he commissioned an AI model of the chef and TV personality’s voice and considered using it to narrate the entire film. In the final cut, Neville told the New Yorker that he used it for three lines in the two-hour production. Among them is a poignant line from an email to artist David Choe, “My life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?”

Neville uses the AI-generated clip as an artistic touch to heighten Choe’s pathos as he recounts the last email he received before Bourdain took his life in 2018. The audio is convincing, albeit a tad flatter compared to the rest of Bourdain’s narration.

A brewing controversy over manufacturing Bourdain’s voice

Neville, a former journalist, didn’t see the problem with mixing AI-generated soundbytes with actual clips of Bourdain’s voice. “We can have a documentary ethics panel about it later,” he joked in the New Yorker interview.

Neville added that he obtained consent from Bourdain’s estate. “I checked, you know, with his widow and his literary executor, just to make sure people were cool with that. And they were like, ‘Tony would have been cool with that.’ I wasn’t putting words into his mouth. I was just trying to make them come alive,” he explained to GQ. Bourdain’s ex-wife, Ottavia Busia-Bourdain, who appears extensively in the documentary, later contested that she ever gave permission for an audio surrogate.

Meredith Broussard, a New York University journalism professor and author of the book Artificial Unintelligence: How Computers Misunderstand the World, says it’s understandable that many find Bourdain’s audio clone deeply unsettling. “I’m not surprised that his widow doesn’t feel like she gave permission for this,” she says. “It’s such new technology that nobody really expects that it’s going to be used this way.”

Using AI in journalism poses the greater ethical dilemma, Broussard said.”People are more forgiving when we use this kind of technology in fiction as opposed to documentaries,” she explains. “In a documentary, people feel like it’s real and so they feel duped.”

Simulated media and AI in journalism

Roadrunner, which is co-produced by CNN, isn’t the first instance that news organizations have relied on AI. Associated Press, for instance, has been using AI to auto-generate articles about corporate quarterly earnings since 2015. Every auto-generated AP article is appended with a note: “This story was generated by Automated Insights,” referring to the machine learning technology they’re using.

Clearly disclosing instances when AI is used is imperative, Broussard says. The fact that Neville discussed it after the fact is noteworthy although it’s debatable whether that will satisfies his critics.”It’s interesting that documentarians are going to have to think about the ethics of deepfakes,” Broussard says. “They’ve always thought about the ethics of storytelling, just like journalists have, but here’s a whole new realm that we’re going to have to develop ethical norms for.”

The controversy reawakens a longstanding debate about how journalists quote their subjects. Celebrated writer Gay Talese, for instance, reconstructs quotes as he remembers them, believing that the tape recorder is “the death knell of literary reportage.” In her book The Journalist and the Murderer, the New Yorker‘s Janet Malcolm underscored the problem of combining fragments of multiple interviews into single statements, as reporter Joe McGinniss did in covering the murder trial of former doctor Jeffrey MacDonald. “The journalist cannot create his subjects any more than the analyst can create his patients.” she wrote.The late writer herself was embroiled in a decade long legal battle over five quotes she used in a 1983 profile about the Sigmund Freud archives. The libel case was dismissed in Malcolm’s favor.

The ethics of deepfakes

Broussard is unsure where she stands about Neville’s use of deepfake technology. “The thing about ethics is that it’s about context,” she explains. “Three lines in a documentary movie—it’s not the end of the world, but it’s important as a precedent. And it’s important to have a conversation about whether we think this is an appropriate thing to do.”

Ultimately Broussard says the Roadrunner controversy presents another argument for regulating the use of AI overall. “There is an emerging conversation in the machine learning field about the need for ethics and machine learning,” she says, “I am grateful that this conversation has begun because it is long overdue.”

## Anthony Bourdain 'Roadrunner' controversy raises a question: Is the term 'documentary' obsolete?

img]

“It’s a narrative with a twist, not a documentary with re-creations,” she explains. “Aside from the fact that 80 percent of the movie is scripted and played by actors, I think the expectations are different when something’s a narrative. . . . You owe a lot of people a lot of things when you make a documentary film. You owe a huge debt to the subjects who have entrusted their lives and their time to you — without payment, at least most all of the time. And you owe the audience your most truthful interpretation of the story. You owe it to them to be much more transparent than you would ever be in a fiction film with how something occurred. You owe a lot to everybody.”

## Deepfake: Why the Anthony Bourdain voice cloning creeps people out

img]

EMBED >More News Videos A Los Angeles restaurant paid tribute to late celebrity chef and world traveler Anthony Bourdain with a massive mural.

The revelation that a documentary filmmaker used voice-cloning software to make the late chef Anthony Bourdain say words he never spoke has drawn criticism amid ethical concerns about use of the powerful technology.The movie "Roadrunner: A Film About Anthony Bourdain" appeared in cinemas Friday and mostly features real footage of the beloved celebrity chef and globe-trotting television host before he died in 2018. But its director, Morgan Neville, told The New Yorker that a snippet of dialogue was created using artificial intelligence technology.That's renewed a debate about the future of voice-cloning technology, not just in the entertainment world but in politics and a fast-growing commercial sector dedicated to transforming text into realistic-sounding human speech."Unapproved voice cloning is a slippery slope," said Andrew Mason, the founder and CEO of voice generator Descript, in a blog post Friday. "As soon as you get into a world where you're making subjective judgment calls about whether specific cases can be ethical, it won't be long before anything goes."Before this week, most of the public controversy around such technologies focused on the creation of hard-to-detect deepfakes using simulated audio and/or video and their potential to fuel misinformation and political conflict.But Mason, who previously founded and led Groupon, said in an interview that Descript has repeatedly rejected requests to bring back a voice, including from "people who have lost someone and are grieving.""It's not even so much that we want to pass judgment," he said. "We're just saying you have to have some bright lines in what's OK and what's not."Angry and uncomfortable reactions to the voice cloning in the Bourdain case reflect expectations and issues of disclosure and consent, said Sam Gregory, program director at Witness, a nonprofit working on using video technology for human rights. Obtaining consent and disclosing the technowizardry at work would have been appropriate, he said. Instead, viewers were stunned - first by the fact of the audio fakery, then by the director's seeming dismissal of any ethical questions - and expressed their displeasure online."It touches also on our fears of death and ideas about the way people could take control of our digital likeness and make us say or do things without any way to stop it," Gregory said.Neville hasn't identified what tool he used to recreate Bourdain's voice but said he used it for a few sentences that Bourdain wrote but never said aloud."With the blessing of his estate and literary agent we used AI technology," Neville said in a written statement. "It was a modern storytelling technique that I used in a few places where I thought it was important to make Tony's words come alive."Neville also told GQ magazine that he got the approval of Bourdain's widow and literary executor. The chef's wife, Ottavia Busia, responded by tweet: "I certainly was NOT the one who said Tony would have been cool with that."Although tech giants like Microsoft, Google and Amazon have dominated text-to-speech research, there are now also a number of startups like Descript that offer voice-cloning software. The uses range from talking customer service chatbots to video games and podcasting.Many of these voice cloning companies prominently feature an ethics policy on their website that explains the terms of use. Of nearly a dozen firms contacted by The Associated Press, many said they didn't recreate Bourdain's voice and wouldn't have if asked. Others didn't respond."We have pretty strong polices around what can be done on our platform," said Zohaib Ahmed, founder and CEO of Resemble AI, a Toronto company that sells a custom AI voice generator service. "When you're creating a voice clone, it requires consent from whoever's voice it is."Ahmed said the rare occasions where he's allowed some posthumous voice cloning were for academic research, including a project working with the voice of Winston Churchill, who died in 1965.Ahmed said a more common commercial use is to edit a TV ad recorded by real voice actors and then customize it to a region by adding a local reference. It's also used to dub anime movies and other videos, by taking a voice in one language and making it speak a different language, he said.He compared it to past innovations in the entertainment industry, from stunt actors to greenscreen technology.Just seconds or minutes of recorded human speech can help teach an AI system to generate its own synthetic speech, though getting it to capture the clarity and rhythm of Anthony Bourdain's voice probably took a lot more training, said Rupal Patel, a professor at Northeastern University who runs another voice-generating company, VocaliD, that focuses on customer service chatbots."If you wanted it to speak really like him, you'd need a lot, maybe 90 minutes of good, clean data," she said. "You're building an algorithm that learns to speak like Bourdain spoke."Neville is an acclaimed documentarian who also directed the Fred Rogers portrait "Won't You Be My Neighbor?" and the Oscar-winning "20 Feet From Stardom." He began making his latest movie in 2019, more than a year after Bourdain's death by suicide in June 2018.

## New Anthony Bourdain documentary deepfakes his voice

img]

In a new documentary, Roadrunner, about the life and tragic death of Anthony Bourdain, there are a few lines of dialogue in Bourdain’s voice that he might not have ever said out loud.

Filmmaker Morgan Neville used AI technology to digitally re-create Anthony Bourdain’s voice and have the software synthesize the audio of three quotes from the late chef and television host, Neville told the New Yorker.

The deepfaked voice was discovered when the New Yorker’s Helen Rosner asked how the filmmaker got a clip of Bourdain’s voice reading an email he had sent to a friend. Neville said he had contacted an AI company and supplied it with a dozen hours of Bourdain speaking.

“ ... and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” Bourdain wrote in an email, and an AI algorithm later narrated an approximation of his voice.

You can hear the line in the documentary’s trailer linked below, right around the 1:30 mark. The algorithm’s generation of Bourdain’s voice is especially audible when it says, “and I am successful.”

Neville told Rosner that there were three lines of dialogue that he wanted Bourdain’s voice to orate, but he couldn’t find previous audio to string together or make it work otherwise.

There’s no shortage of companies that can achieve this kind of AI voice replication, and there’s actually a burgeoning industry of companies that can specifically generate voices for video game characters or allow you to clone your own voice.

But whether it’s ethical to clone a dead person’s voice and have them say things they hadn’t gotten on tape when they were alive is another question, and one Neville doesn’t seem too concerned with.

“We can have a documentary-ethics panel about it later,” he told the New Yorker.