123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721 |
- <!DOCTYPE html>
- <html>
- <head>
- <meta charset="utf-8" />
- <meta name="viewport" content="width=device-width,initial-scale=1" />
- <title>FastSpeech 2 Audio Samples</title>
- <style>
- body {
- margin: 0% 15%;
- padding: 50px 30px;
- background: #fff;
- color: #111;
- font-size: 17px;
- font-family: sans-serif;
- font-weight: 400;
- line-height: 1.8;
- overflow-x: hidden;
- -webkit-font-smoothing: antialiased;
- }
- h1 {
- font-size: 1.75em;
- }
- h2 {
- margin-bottom: 0.4em;
- }
- hr {
- height: 0.5px;
- border-width: 0;
- color: lightgray;
- background-color: lightgray
- }
- table {
- width: 100%;
- }
- audio {
- width: 100%;
- }
- </style>
- </head>
- <body>
- <h1>FastSpeech 2 Audio Samples</h1>
- <p>
- <h2>English Single-Speaker TTS</h2>
- <div>
- <b>Dataset</b>: <a href="https://keithito.com/LJ-Speech-Dataset/">LJSpeech</a>
- </div>
- <div>
- <b>Checkpoint</b>: <a
- href="https://drive.google.com/file/d/1r3fYhnblBJ8hDKDSUDtidJ-BN-xAM9pe/view?usp=sharing">link</a>
- </div>
- <div>
- <b>Config</b>: <a href="https://github.com/ming024/FastSpeech2/tree/master/config/LJSpeech">link</a>
- </div>
- <div>
- <b>Vocoder</b>: <a href="https://github.com/jik876/hifi-gan">HiFi-GAN (LJSpeech)</a>
- </div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">Extremely Long Sentence</th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJSpeech_long_sentence.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>Advanced text-to-speech models such as FastSpeech can synthesize speech significantly faster
- than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an
- autoregressive teacher model for duration prediction and knowledge distillation , which can ease the one-to-many
- mapping problem in TTS. However, FastSpeech has several disadvantages: 1) the teacher-student distillation
- pipeline is complicated, 2) the duration extracted from the teacher model is not accurate enough, and the target
- mel-spectrograms distilled from teacher model suffer from information loss due to data simplification, both of
- which limit the voice quality. In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech
- and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth
- target instead of the simplified output from teacher, and 2) introducing more variation information of speech as
- conditional inputs. Specifically, we extract duration, pitch and energy from speech waveform and directly take
- them as conditional inputs during training and use predicted values during inference. We further design FastSpeech
- 2s, which is the first attempt to directly generate speech waveform from text in parallel, enjoying the benefit of
- full end-to-end training and even faster inference than FastSpeech. Experimental results show that 1) FastSpeech 2
- and 2s outperform FastSpeech in voice quality with much simplified training pipeline and reduced training time; 2)
- FastSpeech 2 and 2s can match the voice quality of autoregressive models while enjoying much faster inference
- speed.
- </em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ042-0094 (Ground-Truth) </th>
- <th style="text-align: center">LJ042-0094 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ042-0094_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ042-0094_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>the soviet authorities denied oswald permission</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ021-0108 (Ground-Truth) </th>
- <th style="text-align: center">LJ021-0108 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ021-0108_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ021-0108_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>from the standpoint of the good of the industries themselves, as well as the general public
- interest,</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ030-0031 (Ground-Truth) </th>
- <th style="text-align: center">LJ030-0031 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ030-0031_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ030-0031_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>secret service agents formed a cordon to keep the press and photographers from impeding their
- passage and scanned the
- crowd for threatening movements.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ001-0012 (Ground-Truth) </th>
- <th style="text-align: center">LJ001-0012 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ001-0012_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ001-0012_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>especially as no more time is occupied, or cost incurred, in casting, setting, or printing
- beautiful letters</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ011-0202 (Ground-Truth) </th>
- <th style="text-align: center">LJ011-0202 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ011-0202_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ011-0202_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>the uncle claimed her. the husband resisted.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ006-0114 (Ground-Truth) </th>
- <th style="text-align: center">LJ006-0114 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ006-0114_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ006-0114_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>they bought their offices from one another, and were thus considered to have a vested interest
- in them.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ009-0038 (Ground-Truth) </th>
- <th style="text-align: center">LJ009-0038 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ009-0038_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ009-0038_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>in the center of the chapel was the condemned pew, a large dock-like erection painted
- black.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ006-0161 (Ground-Truth) </th>
- <th style="text-align: center">LJ006-0161 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ006-0161_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ006-0161_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>again, a turnkey deposed that his chief did not enter the wards more than once a fortnight.</em>
- </div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ027-0127 (Ground-Truth) </th>
- <th style="text-align: center">LJ027-0127 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ027-0127_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ027-0127_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>while neglecting to maintain his unity of ideal in the case of nearly all the numerous species
- of snakes, he should have
- added a tiny rudiment in the case of the python</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">LJ050-0158 (Ground-Truth) </th>
- <th style="text-align: center">LJ050-0158 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ050-0158_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LJSpeech/LJ050-0158_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Text</b>: <em>the department hopes to design a practical system which will fully meet the needs of the
- protective research section of
- the secret service.</em></div>
- </p>
- <hr>
- <p>
- <h2>Mandarin Multi-Speaker TTS</h2>
- <div>
- <b>Dataset</b>: <a href="http://www.aishelltech.com/aishell_3">AISHELL-3</a>
- </div>
- <div>
- <b>Checkpoint</b>: <a
- href="https://drive.google.com/file/d/1uYWd5JlaK-fochQ2JFgIP_wOEkoLPLqs/view?usp=sharing">link</a>
- </div>
- <div>
- <b>Config</b>: <a href="https://github.com/ming024/FastSpeech2/tree/master/config/AISHELL3">link</a>
- </div>
- <div>
- <b>Vocoder</b>: <a href="https://github.com/jik876/hifi-gan">HiFi-GAN (universal)</a>
- </div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB08630110 (Ground-Truth) </th>
- <th style="text-align: center">SSB08630110 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB08630110_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB08630110_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB0863</em></div>
- <div><b>Text</b>: <em>放首歌来听,丁当的歌 (fang4 shou3 ge1 lai2 ting1 sp ding1 dang1 de5 ge1)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB07940279 (Ground-Truth) </th>
- <th style="text-align: center">SSB07940279 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB07940279_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB07940279_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB0794</em></div>
- <div><b>Text</b>: <em>中国,今日有望创下历史单,日,出行人次最高 (zhong1 guo2 sp jin1 ri4 you3 wang4 chuang4 xia4 li4 shi3 dan1 sp ri4 sp
- chu1 xing2 ren2 ci4 zui4 gao1)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB00800003 (Ground-Truth) </th>
- <th style="text-align: center">SSB00800003 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB00800003_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB00800003_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSSB0080</em></div>
- <div><b>Text</b>: <em>中新网七月十四日电 (zhong1 xin1 wang3 qi1 yue4 shi2 si4 ri4 dian4)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB15930399 (Ground-Truth) </th>
- <th style="text-align: center">SSB15930399 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB15930399_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB15930399_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB1593</em></div>
- <div><b>Text</b>: <em>也就是说汤米踩过点 (ye3 jiu4 shi4 shuo1 tang1 mi3 cai3 guo4 dian3)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB16070080 (Ground-Truth) </th>
- <th style="text-align: center">SSB16070080 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB16070080_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB16070080_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB1607</em></div>
- <div><b>Text</b>: <em>但阵营之间差距拉大 (dan4 zhen4 ying2 zhi1 jian1 cha1 ju4 la1 da4)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB10200388 (Ground-Truth) </th>
- <th style="text-align: center">SSB10200388 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB10200388_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB10200388_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSSB1020</em></div>
- <div><b>Text</b>: <em>我抱着一大堆纸箱回了家 (wo3 bao4 zhe5 yi2 da4 dui1 zhi3 xiang1 hui2 le5 jia1)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB05900363 (Ground-Truth) </th>
- <th style="text-align: center">SSB05900363 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB05900363_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB05900363_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB0590</em></div>
- <div><b>Text</b>: <em>我们不能分头行动 (wo3 men2 bu4 neng2 fen1 tou2 xing2 dong4)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB04700028 (Ground-Truth) </th>
- <th style="text-align: center">SSB04700028 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB04700028_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB04700028_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB0470</em></div>
- <div><b>Text</b>: <em>毕节市的景点有什么 (bi4 jie2 shi4 de5 jing2 dian2 you3 shen2 me5)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB06290079 (Ground-Truth) </th>
- <th style="text-align: center">SSB06290079 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB06290079_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB06290079_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB0629</em></div>
- <div><b>Text</b>: <em>你以为 (ni2 yi3 wei2)</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">SSB08220199 (Ground-Truth) </th>
- <th style="text-align: center">SSB08220199 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB08220199_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/AISHELL3/SSB08220199_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>SSB0822</em></div>
- <div><b>Text</b>: <em>该职位员工可以,居住在西雅,图或,洛,杉矶 (gai1 zhi2 wei4 yuan2 gong1 ke2 yi3 sp ju1 zhu4 zai4 xi1 ya3 sp tu2 huo4
- sp luo4 sp shan3 ji1)</em></div>
- </p>
- <hr>
- <p>
- <h2>English Multi-Speaker TTS</h2>
- <div>
- <b>Dataset</b>: <a href="https://research.google/tools/datasets/libri-tts/">LibriTTS</a>
- </div>
- <div>
- <b>Checkpoint</b>: <a
- href="https://drive.google.com/file/d/1M6BxJtTxYW56dG1Myz9MqZmG_OXJLUqy/view?usp=sharing">link</a>
- </div>
- <div>
- <b>Config</b>: <a href="https://github.com/ming024/FastSpeech2/tree/master/config/LibriTTS">link</a>
- </div>
- <div>
- <b>Vocoder</b>: <a href="https://github.com/jik876/hifi-gan">HiFi-GAN (universal)</a>
- </div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">207_131203_000011_000000 (Ground-Truth) </th>
- <th style="text-align: center">207_131203_000011_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/207_131203_000011_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/207_131203_000011_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>207</em></div>
- <div><b>Text</b>: <em>his mother, however, was a little shy of the company for him, and besides she could not always
- spare him.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">2299_6524_000057_000000 (Ground-Truth) </th>
- <th style="text-align: center">2299_6524_000057_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/2299_6524_000057_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/2299_6524_000057_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>2299</em></div>
- <div><b>Text</b>: <em>on the arrival at the hut to my chagrin we found it filled with snow.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">2388_153731_000003_000000 (Ground-Truth) </th>
- <th style="text-align: center">2388_153731_000003_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/2388_153731_000003_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/2388_153731_000003_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>2388</em></div>
- <div><b>Text</b>: <em>the story of the first scientific observation of the corona and the prominences is thrillingly
- interesting, and in fact dramatic.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">3615_14677_000014_000000 (Ground-Truth) </th>
- <th style="text-align: center">3615_14677_000014_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/3615_14677_000014_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/3615_14677_000014_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>3615</em></div>
- <div><b>Text</b>: <em>beat three eggs with a pinch of salt; add one pint of milk and two thirds of a cup of
- flour.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">4744_31668_000003_000001 (Ground-Truth) </th>
- <th style="text-align: center">4744_31668_000003_000001 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/4744_31668_000003_000001_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/4744_31668_000003_000001_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>4744</em></div>
- <div><b>Text</b>: <em>far from morbid naturally, she did her best to deny the thought, and so simple and unartificial
- was her type of mind that for weeks together she would wholly lose it.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">480_127525_000006_000000 (Ground-Truth) </th>
- <th style="text-align: center">480_127525_000006_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/480_127525_000006_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/480_127525_000006_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>480</em></div>
- <div><b>Text</b>: <em>i tried and found by experiment that the tide kept sweeping us westward until i had laid her
- head due east, or just about right angles to the way we ought to go.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">5126_27504_000011_000003 (Ground-Truth) </th>
- <th style="text-align: center">5126_27504_000011_000003 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/5126_27504_000011_000003_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/5126_27504_000011_000003_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>5126</em></div>
- <div><b>Text</b>: <em>it looked as if our luck was dead out, and we began to think our chance of getting across the
- border to queensland, and clear out of the colony that way, looked worse every day.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">6098_57837_000008_000000 (Ground-Truth) </th>
- <th style="text-align: center">6098_57837_000008_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/6098_57837_000008_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/6098_57837_000008_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>6098</em></div>
- <div><b>Text</b>: <em>word was sent of their predicament to the nearest fort, and lieutenant pershing was sent with a
- small detachment to their rescue.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">6544_231862_000065_000001 (Ground-Truth) </th>
- <th style="text-align: center">6544_231862_000065_000001 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/6544_231862_000065_000001_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/6544_231862_000065_000001_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>6544</em></div>
- <div><b>Text</b>: <em>denzil did look, and uttered a second cry more startling than the first.</em></div>
- <hr>
- <table>
- <tr>
- <th style="text-align: center">968_122545_000053_000000 (Ground-Truth) </th>
- <th style="text-align: center">968_122545_000053_000000 (Synthesized) </th>
- </tr>
- <tr>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/968_122545_000053_000000_ground-truth.wav" autoplay />
- </audio></td>
- <td style="text-align: center"><audio controls="controls">
- <source src="./demo/LibriTTS/968_122545_000053_000000_synthesized.wav" autoplay />
- </audio></td>
- </tr>
- </table>
- <div><b>Speaker</b>: <em>968</em></div>
- <div><b>Text</b>: <em>the bee fought the window angrily, up and down, up and down, for several minutes; then found the
- open glass and whirled
- out into the sunshine, joyfully.</em></div>
- </p>
- </body>
- </html>
|