|
@@ -7,35 +7,35 @@ categories = ["marketing"]
|
|
|
banner = "https://i.imgur.com/jdQb3ZH.jpg"
|
|
|
+++
|
|
|
|
|
|
- ## Creating An AI Text-to-Speech Using IBM Watson
|
|
|
+## Creating An AI Text-to-Speech Using IBM Watson
|
|
|
|
|
|
- ![img](https://analyticsindiamag.com/wp-content/uploads/2021/06/1-1.png)]
|
|
|
+![img](https://analyticsindiamag.com/wp-content/uploads/2021/06/1-1.png)
|
|
|
|
|
|
The recent decade has seen some of the most groundbreaking developments in the field of artificial intelligence. Especially in recent years, data collection and analysis has bolstered up considerably with the help of interconnected devices through the internet and super-fast computer processing. Whether they be in the domain of automobiles, with self-driving cars, in the healthcare industry with artificially intelligent robot systems that can aid a doctor with surgery, the manufacturing industry and much more. Artificial Intelligence, combined with the power of Machine Learning, has provided us with a wide spectrum of implementations and uses, even to be discovered in the years to come. One of the most fundamental advancements in such has been Virtual Voice Assistants and Voice & Text recognition services. With the pace of life getting faster and busier every day, our voice has become an essential tool to command and generate results instantly. Consumer-based Virtual Assistants such as Alexa by Amazon and Siri by Apple, or Google Assistant, have become a part of our daily lives to obtain information, schedule and plan tasks, or leisure. But have you ever pondered what goes on behind the scenes? We will try to explore one of the aspects, called Text-to-Speech.
|
|
|
|
|
|
-What is Text-to-Speech?
|
|
|
+### What is Text-to-Speech?
|
|
|
|
|
|
Text-to-Speech is a form of Speech Synthesis where the algorithm converts language into human speech. The main goal of Text-to-speech is to generate natural-sounding speech signals for the voice assistant agents. It can also be a feature through which your computer or phone reads on-screen text aloud to you, often used as an accessibility feature to help people who have trouble reading on-screen text, and is also convenient for those who want it to be read for them. Text-to-speech has become so omnipresent that people encounter it every day without even realizing it. Text-to-Speech, often called TTS, often find their use in Smart Speakers, Ebook Readers, Mapping and Direction-based software, Word Processors, and much more. The voice for TTS is usually computer-generated; reading speeds can be sped up or slowed down accordingly. Many tools even highlight words as they are read aloud to allow the user to see text and hear it simultaneously. Text-to-speech can also be considered an optimal tool for converting immense masses of text into playable audio data for ease of work.
|
|
|
|
|
|
-About IBM-Watson Cloud
|
|
|
+### About IBM-Watson Cloud
|
|
|
|
|
|
The IBM Cloud is a platform that provides a range of services, a combination of both Platforms as a Service (Paas) and Infrastructure as a Service(IaaS), for providing the integrated experience. It is one of the most open and secure public clouds for businesses. A hybrid multi-cloud platform with advanced data and AI capabilities and deep enterprise expertise across 20 different industries. It’s a full-stack cloud platform, having over 170 products and services covering essential domains in Information Technology such as Data, Containers, AI, IoT, and Blockchain. The Cloud also provides solutions that enable higher levels of compliance, security, and management, with architecture patterns and methods for rapid delivery across mission-critical workloads. It is available worldwide, across 19 countries and regions in North and South America, Europe, Asia, and Australia, so that one is enabled to deploy services locally with global scalability.
|
|
|
|
|
|
The platform consists of multiple components that work together to provide a consistent and dependable cloud experience.
|
|
|
|
|
|
-Getting started with AI Text to Speech using Watson Text-to-Speech
|
|
|
+### Getting started with AI Text to Speech using Watson Text-to-Speech
|
|
|
|
|
|
We will try to get a flavour of what it takes to build a Text-to-Speech recognition model and how it works. The following steps will be used to create one such model :
|
|
|
|
|
|
-We will first capture our text using python
|
|
|
+* We will first capture our text using python
|
|
|
|
|
|
-We will then set up our Text-to-Speech Model Using The IBM-Watson TTS.
|
|
|
+* We will then set up our Text-to-Speech Model Using The IBM-Watson TTS.
|
|
|
|
|
|
-Create an Output Mp3 file that contains the audio to our text
|
|
|
+* Create an Output Mp3 file that contains the audio to our text
|
|
|
|
|
|
The following code implementation is in reference to the official implementation, whose video tutorial you can find here.
|
|
|
|
|
|
-Creating The TTS Model
|
|
|
+### Creating The TTS Model
|
|
|
|
|
|
First, we will install the IBM-Watson dependency library to help us call our modules. It can be installed through pip using the following command.
|
|
|
|
|
@@ -43,43 +43,71 @@ First, we will install the IBM-Watson dependency library to help us call our mod
|
|
|
|
|
|
!pip install ibm-watson
|
|
|
|
|
|
-Setup The Cloud Services and Authentication
|
|
|
+### Setup The Cloud Services and Authentication
|
|
|
|
|
|
We need to set up the service first using the IBM Watson on cloudTTS module.
|
|
|
|
|
|
To do so, we’ll first go to cloud.ibm.com/catalog.
|
|
|
|
|
|
+![](https://lh6.googleusercontent.com/FWKv60dJUKcI00tvIr24ZUTCaqy9criEf5Re6zmNyTSCyPEzVHbUzbtAXmcnFo5_7OIMztvEBgtBVgMXHLo5dQfqPYQhYCNohmH2gIHvYEmO2dLc0_Gnk7VcQElhljzWmQ9JDPc6)
|
|
|
+
|
|
|
Click on services and from Category,
|
|
|
|
|
|
+![](https://lh5.googleusercontent.com/1bIWWRHGY3L9WnPhVsgRYtUirruPNykZ7GSQJh8NxdTVqt6K8q1Oe18-fxkW7KnPzyv9QrIESwQvJRUfRb-t_tUq-Nvw91WgoXIzWVS3oFvrz5sm0rJVH3YB9vXUeTRbUssjHVW8)
|
|
|
+
|
|
|
Tick the AI/Machine Learning checkbox to filter out the service modules.
|
|
|
|
|
|
+![](https://lh3.googleusercontent.com/QF_iT8mieAPBaD-c8VNG6G4QBNRpXWPYTAfMBrKzD4jf5uyV9sArt_Fd6xDB_HCR4KgPi3G3w9b54joBd3eR9q1_hlsuKhdjDEHorAf_Ic42KVVvKjxXza2CyE3oBXzBIpbINfii)
|
|
|
+
|
|
|
Then click on Text to Speech, and select the free plan that offers up to 10k characters to convert per month.
|
|
|
|
|
|
+![](https://lh3.googleusercontent.com/AuRSzX4_iWVmqeBYdJyukKjOSQyb3NL6p922tvbfBbVOPfwrM4gJ_6Qu2XT9VqqpjLWKjLcMH_Mk9GxOwFWy6Z-PXXj0Wfmur6DAaKRd_uAlH_Wqc7xchJNnu6c5lnMDLichEjdo)
|
|
|
+
|
|
|
After doing so, we’ll write a few lines of code in python to authenticate our model.
|
|
|
|
|
|
-#setup our text-to-speech module from ibm_watson import TextToSpeechV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator #Authenticate our Model
|
|
|
+#setup our text-to-speech module
|
|
|
+from ibm_watson import TextToSpeechV1
|
|
|
+from ibm_cloud_sdk_core.authenticators import
|
|
|
+IAMAuthenticator #Authenticate our Model
|
|
|
|
|
|
After it is created, from Manage, copy the API key and Url and paste it to our code.
|
|
|
|
|
|
-# Creds Text to Speech apikey = 'KEY HERE' url = 'URL HERE'
|
|
|
+![](https://lh5.googleusercontent.com/5tGHxGr5sG98dlOlmXhG-MMwk-1Tywl5Y_TsMBVGIScZxLh8jmeOXL0bixHv05gx_LcyzplkZSlcV4MPLDCgW6ALPiBX2Edo6fG9uVE2lC0kRi7yb50cYxBVN5XAzlcrPfPIK32O)
|
|
|
+
|
|
|
+#Creds Text to Speech
|
|
|
+apikey = 'KEY HERE'
|
|
|
+url = 'URL HERE'
|
|
|
|
|
|
Now, we will complete our final authentication from the server using the following code.
|
|
|
|
|
|
-#setup service authenticator = IAMAuthenticator(apikey) #Create our service tts = TextToSpeechV1(authenticator=authenticator) #set the IBM service url tts.set_service_url(url)
|
|
|
+#setup service
|
|
|
+authenticator = IAMAuthenticator(apikey)
|
|
|
+#Create our service
|
|
|
+tts = TextToSpeechV1(authenticator=authenticator)
|
|
|
+#set the IBM service url
|
|
|
+tts.set_service_url(url)
|
|
|
|
|
|
-Demo Testing A Basic Language Model
|
|
|
+### Demo Testing A Basic Language Model
|
|
|
|
|
|
We will first test our created model using a single line to read and create an audio file named speech for it. We will also be calling the synthesize function from IBM-Watson to make our created model speak the input text and set our output as an Mp3 audio format.
|
|
|
|
|
|
-with open('./speech.mp3', 'wb') as audio_file: res = tts.synthesize('Hello World!', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result() audio_file.write(res.content) #write the content to the audio file
|
|
|
+with open('./speech.mp3', 'wb') as audio_file:
|
|
|
+ res = tts.synthesize('Hello World!',
|
|
|
+accept='audio/mp3', voice='en-
|
|
|
+US_AllisonV3Voice').get_result()
|
|
|
+ audio_file.write(res.content) #write the content to the audio file
|
|
|
|
|
|
You will find the audio output in the path provided when the code is successfully executed.
|
|
|
|
|
|
-Reading Text from our File
|
|
|
+### Reading Text from our File
|
|
|
|
|
|
We will now use our tested model to create a text-to-audio file from the text file we have. Here I have used Winston Churchill’s speech as the text input.
|
|
|
|
|
|
-#testing our model using an audio file with open('/content/Churchill.txt', 'r') as f: text = f.readlines() #view the contents Text
|
|
|
+ #testing our model using an audio file
|
|
|
+with open('/content/Churchill.txt', 'r') as f:
|
|
|
+ text = f.readlines()
|
|
|
+#view the contents
|
|
|
+Text
|
|
|
|
|
|
It will give us the following output.
|
|
|
|
|
@@ -113,106 +141,38 @@ text = ''.join(str(line) for line in text) #concatenate and feed it to the modul
|
|
|
|
|
|
Generating the Output
|
|
|
|
|
|
-Generating our output audio file created from the text, You can choose the voice according to the language you want, and the gender of voice needed. Furthermore, you can view all the details regarding the voices and languages available from here.
|
|
|
+Generating our output audio file created from the text, You can choose the voice according to the language you want, and the gender of voice needed. Furthermore, you can view all the details regarding the voices and languages available from [here](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices).
|
|
|
|
|
|
-with open('./winston.mp3', 'wb') as audio_file: res = tts.synthesize(text, accept='audio/mp3', voice='en-GB_JamesV3Voice').get_result() #selecting the audio format and voice audio_file.write(res.content) #writing the contents from text file to a audio file
|
|
|
+with open('./winston.mp3', 'wb') as audio_file:
|
|
|
+ res = tts.synthesize(text, accept='audio/mp3',
|
|
|
+voice='en-GB_JamesV3Voice').get_result() #selecting the audio format and voice
|
|
|
+ audio_file.write(res.content) #writing the contents from
|
|
|
+text file to a audio file
|
|
|
|
|
|
You will find your newly created audio file named “winston.mp3” inside the path provided!
|
|
|
|
|
|
-Using a Different Language Model
|
|
|
+### Using a Different Language Model
|
|
|
|
|
|
You can also use the following method to create a model to read a different language as well,
|
|
|
|
|
|
I have created another audio file using Spanish text and calling Spanish language agent from IBM-Watson Cloud.
|
|
|
|
|
|
-#input textcasa = """Mi nueva casa está en una calle ancha que tiene muchos árboles. El piso de arriba de mi casa tiene tres dormitorios y un despacho para trabajar. El piso de abajo tiene una cocina muy grande, un comedor con una mesa y seis sillas, un salón con dos sofás verdes, una televisión y cortinas. Además, tiene una pequeña terraza con piscina donde puedo tomar el sol en verano. Me gusta mucho mi casa porque puedo invitar a mis amigos a cenar o a ver el fútbol en mi televisión. Además, cerca de mi casa hay muchas tiendas para hacer la compra, como panadería, carnicería y pescadería." #synthesize and write output into a MP3 audio with open('./casa.mp3', 'wb') as audio_file: res = tts.synthesize(casa, accept='audio/mp3', voice='es-US_SofiaV3Voice').get_result() audio_file.write(res.content)
|
|
|
-
|
|
|
-EndNotes
|
|
|
-
|
|
|
-We have now learned how to create a model to convert our text files into MP3 audio files and implemented text-to-speech by performing the following steps. You can choose bigger text files and play with spaces and punctuations to see how the audio speed & speech differs from the original. The full Colab file for the following can be accessed from here.
|
|
|
-
|
|
|
-Happy Learning!
|
|
|
-
|
|
|
-References
|
|
|
-
|
|
|
-Join Our Telegram Group. Be part of an engaging online community. Join Here.
|
|
|
-
|
|
|
-Subscribe to our Newsletter
|
|
|
-
|
|
|
-Get the latest updates and relevant offers by sharing your email.
|
|
|
-
|
|
|
- ## Why business and academia need each other for better A.I.
|
|
|
-
|
|
|
- ![img](https://content.fortune.com/wp-content/uploads/2021/07/GettyImages-1035453302.jpg?resize=1200,600)]
|
|
|
-
|
|
|
-Jeff Bezos thanks Amazon workers and customers after space flight: ‘You paid for all of this’
|
|
|
-
|
|
|
- ## AI Strategies: What Is Natural Language Processing and How Can It Help Businesses?
|
|
|
-
|
|
|
- ![img](https://biztechmagazine.com/sites/biztechmagazine.com/files/styles/cdw_hero/public/articles/202107/nlp%20perfcon%20hero.jpg?itok=RszSk-uo)]
|
|
|
-
|
|
|
-Combining computing technologies with human language has become a driving force for modern-day technology.
|
|
|
-
|
|
|
-The experience of using a smartphone, for example, wouldn’t be quite the same without the ability to pull up a map with a computerized voice navigating your next turn. Tools like Google Lens, which can translate words captured by a camera on the fly, would not be quite as impressive.
|
|
|
-
|
|
|
-These tools represent just some of the power of natural language processing (NLP), a form of artificial intelligence that promises to have use cases far beyond smartphones.
|
|
|
-
|
|
|
-For businesses, the ability to process speech and written words in real time could prove essential as organizations hope to better understand consumer and employee sentiment, analyze data and automate tasks that once required careful manual analysis.
|
|
|
-
|
|
|
-Still, we may be only scratching the surface of NLP.
|
|
|
-
|
|
|
-What Is Natural Language Processing?
|
|
|
-
|
|
|
-At a high level, natural language processing describes a computer’s ability to process and comprehend language, whether in written, spoken or digital form.
|
|
|
-
|
|
|
-WATCH: Learn how to incorporate social responsibility into artificial intelligence.
|
|
|
-
|
|
|
-It’s often thought of as a very recent capability of computers. In fact, however, NLP dates to the earliest days of computers. For example, early optical character recognition systems relied on specialized fonts that computers could detect.
|
|
|
-
|
|
|
-Today, natural language processing is seen as mainstream and practical, with AI-powered smart assistants such as Google Assistant, Apple’s Siri, Amazon Alexa and Microsoft’s Cortana well established as mainstream use cases.
|
|
|
-
|
|
|
-AI has become crucial in business as well, and NLP is seen as a major area of growth for many companies’ AI strategies. The Global AI Adoption Index 2021, an IBM Watson project, found that nearly half of businesses are using some form of NLP technology, with another quarter of businesses expected to use it within the next 12 months.
|
|
|
-
|
|
|
-“The top use cases for NLP today — improving the customer experience and helping employees reach new levels of productivity — are critical priorities for nearly every business,” says Dakshi Agrawal, an IBM fellow and CTO for AI at IBM.
|
|
|
-
|
|
|
-What Are the Steps in NLP?
|
|
|
-
|
|
|
-The steps involved in natural language processing start with having access to data in its original form (a written message in a database, for example) and a language base to compare it with.
|
|
|
-
|
|
|
-After the data is collected, the information is broken down using several data preprocessing techniques. Among them:
|
|
|
-
|
|
|
-Tokenization, or breaking down words into more basic forms
|
|
|
-
|
|
|
-The removal of common stop words
|
|
|
-
|
|
|
-Lemmatization, the process of converting a word to its meaningful base form
|
|
|
-
|
|
|
-Part-of-speech tagging to determine what part of the sentence a given phrase appears in
|
|
|
-
|
|
|
-REGISTER: Learn more about leveraging AI to drive business initiatives in the weekly CDW Tech Talk series. Click the banner below to register.
|
|
|
-
|
|
|
- ## WHO chief says Covid hasn't defeated the Olympics
|
|
|
-
|
|
|
- ![img](https://s.yimg.com/hd/cp-video-transcode/prod/2021-07/21/60f7e3f6540ba326483abfca/60f7e3f6540ba326483abfcb_o_U_v2.jpg)]
|
|
|
-
|
|
|
-Best Life
|
|
|
-
|
|
|
-People who are fully vaccinated are the most protected against COVID, but that doesn't mean there isn't any risk. Breakthrough infections are being reported more and more, as overall infections in the U.S. have increased due to the Delta variant—and while most of these breakthrough cases have been mild, there have been a handful of serious cases. The Centers for Disease Control and Prevention (CDC) warned early on that no vaccine is 100 percent effective, and that a very small number of vaccinat
|
|
|
-
|
|
|
- ## Hawaii Climate Suits Belong In Fed. Court, 9th Circ. Told
|
|
|
-
|
|
|
- ![img](https://www.law360.com/images/360.png)]
|
|
|
-
|
|
|
-Law360 (July 20, 2021, 4:51 PM EDT) -- Chevron Corp. and other fossil fuel companies said Monday that Hawaii suits seeking climate change-related infrastructure damages clearly belong in federal court and urged the Ninth Circuit to reverse lower court rulings remanding the cases to state court. Chevron, ExxonMobil Corp. and other energy producers told the appeals court that not only did they at times work at the federal government's behest, meaning the suits lodged by Honolulu and Maui County can be removed to federal court on so-called "federal officer removal" grounds, but they also did substantial offshore drilling on the Outer Continental Shelf, which is governed by federal law....
|
|
|
-
|
|
|
-Stay ahead of the curve
|
|
|
-
|
|
|
-In the legal profession, information is the key to success. You have to know what’s happening with clients, competitors, practice areas, and industries. Law360 provides the intelligence you need to remain an expert and beat the competition.
|
|
|
-
|
|
|
-Access to case data within articles (numbers, filings, courts, nature of suit, and more.)
|
|
|
+#input textcasa = """Mi nueva casa está en una calle ancha que tiene muchos árboles.
|
|
|
+El piso de arriba de mi casa tiene tres dormitorios y un despacho para trabajar.
|
|
|
+El piso de abajo tiene una cocina muy grande, un comedor con una mesa y seis sillas,
|
|
|
+un salón con dos sofás verdes, una televisión y cortinas.
|
|
|
+Además, tiene una pequeña terraza con piscina donde puedo tomar el sol en verano.
|
|
|
+Me gusta mucho mi casa porque puedo invitar a mis amigos a cenar o a ver el fútbol en mi televisión.
|
|
|
+Además, cerca de mi casa hay muchas tiendas para hacer la compra, como panadería, carnicería y pescadería."
|
|
|
|
|
|
-Access to attached documents such as briefs, petitions, complaints, decisions, motions, etc.
|
|
|
+#synthesize and write output into a MP3 audio
|
|
|
+with open('./casa.mp3', 'wb') as audio_file:
|
|
|
+ res = tts.synthesize(casa, accept='audio/mp3',
|
|
|
+voice='es-US_SofiaV3Voice').get_result()
|
|
|
+ audio_file.write(res.content)
|
|
|
|
|
|
-Create custom alerts for specific article and case topics and so much more!
|
|
|
+### EndNotes
|
|
|
|
|
|
+We have now learned how to create a model to convert our text files into MP3 audio files and implemented text-to-speech by performing the following steps. You can choose bigger text files and play with spaces and punctuations to see how the audio speed & speech differs from the original. The full Colab file for the following can be accessed from [here](https://colab.research.google.com/drive/1hKpYhI45y9aXJxRDe7WfpXBCO8JqZgZQ?usp=sharing).
|
|
|
|
|
|
+Happy Learning!
|