Parcourir la source

Merge branch 'master' of http://git.choozmo.com:3000/choozmo/AI_Video_LP into master

Your Name il y a 3 ans
Parent
commit
0be7a73186

+ 128 - 0
webSite/content/news/converttovideo.md

@@ -0,0 +1,128 @@
++++
+title = "Top 10 slides to MP4 Video Converters"
+date = "2021-07-18T00:21:34+08:00"
+tags = ["video marketing", "text to video"]
+type = "blog"
+categories = ["marketing"]
+banner = "img/banners/banner-3.jpg"
++++
+
+Software that easily converts PowerPoint presentations to videos is a useful tool for anyone who wants to share their files on social media, blogs, or video sharing sites. Converting your PowerPoint presentation to video before posting will protect your presentation from modification and repackaging by someone else. It will also allow you to play it on a wide array of video supporting devices like Smart TVs, tablets, and smartphones.
+
+But, if you’d like to convert your presentation to video, that doesn’t necessarily mean you need to look to third-party software. Beginning with version 2010, Microsoft PowerPoint itself can do just that. Here are the steps you need to follow to convert your PowerPoint slides into video format.
+How to Convert PowerPoint Slides (PPT) to MP4 Video 
+----------------------------------------------------
+
+1\. Open the PowerPoint presentation you’d like to convert.
+
+2\. Click on **File**.
+
+3\. The next step will differ based on which version of PowerPoint you’re using. Click on 
+
+*   **Save & Send**, in PowerPoint 2010.
+*   **Export**, in version 2013 and above.
+*   **Save as Movie**, if using PowerPoint for Mac.
+4\. Select **Create a Video** in Windows or **Save as Movie** in the Mac OS.
+
+5\. Proceed to the conversion process and wait for the output video to be created.
+
+It’s great to have this feature included in PowerPoint itself, but we experienced that the timings of slide transitions in a video converted this way may be off. So, if you find yourself in a situation like this, or if you use PowerPoint 2007, which doesn’t support conversion to video, here are the top ten PowerPoint to video converters:
+
+1\. iSpring River
+-----------------
+
+_Windows, Shareware_ _$97/year_
+
+![iSpring River PPT to video converter ](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593607502.png "iSpring River")
+
+[iSpring River](https://www.ispringsolutions.com/ispring-river) integrates with PowerPoint and allows you to convert your slideshows to MP4 format and upload them directly to YouTube in a single click. It can help teachers and trainers in the education and corporate sectors to turn even the most complex PowerPoint presentations into crystal-clear 1080p HD format. The video file the tool creates is compatible with Windows, Mac, and Android operating systems.
+
+2\. PowerVideoPoint Lite
+------------------------
+
+_Windows, Freeware_
+
+![](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606150.png)
+
+The Free PPT to Video [Converter by Digital Office Pro](http://www.digitalofficepro.com/powerpoint/ppt-to-dvd-lite.html) will convert your PowerPoint presentation file to WMV, MOV, MKV, and ASF formats. Video files are compatible with most mobile devices, including Apple and Android smartphones.
+
+3\. RZ PowerPoint Converter
+---------------------------
+
+_Windows, Shareware_  
+_$39.99 for a lifetime license_
+
+![RZ PowerPoint Converter](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606205.png "RZ Soft")
+
+[RZ Soft](http://www.freepowerpointtovideo.com/download-free-powerpoint-to-video-dvd-converter.html) is a PowerPoint to video converter that supports all versions of PowerPoint (going back to 2003) and all PowerPoint formats. The software adjusts the final output size and resolution to user specifications.
+
+4\. Leawo 
+----------
+
+_Windows, Shareware_  
+_$44.95 for a lifetime license_
+
+![Leawo Converter](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606250.png "Leawo")
+
+Free to try, [Leawo PowerPoint Converter](https://www.leawo.com/pro/powerpoint-video-converter.html) will convert your PowerPoint slideshow into popular video formats. A dated graphical interface can be accepted as a trade-off for an ability to batch convert presentations.
+
+5\. Online Convert
+------------------
+
+_Online, Freeware_
+
+![Online Convert converter](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606285.png "Online Convert")
+
+[Online Convert](http://www.online-convert.com/) is a free web service that converts your PowerPoint to multiple video formats and allows you to download the final video to your computer. The site also offers a number of other useful file converters, all free, and with no watermark in the final output file.
+
+6\. ImTOO Convert PowerPoint to Video Free
+------------------------------------------
+
+_Windows, Freeware  
+$29.95 for a lifetime license_
+
+![ImTOO Convert PowerPoint to Video Free](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606834.png "ImTOO ")
+
+The [ImToo](http://www.imtoo.com/convert-powerpoint-to-video.html) converter boasts the ability to convert your PowerPoint files to video without the need to have PowerPoint installed on your computer. This software also enables you to add commentaries, watermarks, and music to your PowerPoint before converting it.
+
+7\. Xilisoft
+------------
+
+_Windows, Shareware_  
+_$49.95 for a lifetime license_
+
+![Xilisoft converter](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606402.png "Xilisoft")
+
+[Xilisoft PowerPoint to MP4 Converter](http://www.xilisoft.com/video-converter-software.html) is a doppelganger of the ImTOO converter with a slightly more limited functionality in the free version. It allows you to convert PowerPoint presentations to a number of popular video formats, including full high-definition (HD) quality. This software provides a number of tools to help you customize and tweak your presentation prior to conversion.
+
+8. Moyea PPT to Video Converter
+Windows, Shareware
+$49.95 for a lifetime license
+
+Moyea PPT to Video Converter
+Like Leawo, MoyeaSoft PowerPoint to Video Converter can convert your PowerPoint slideshow to a variety of popular video formats, preserving all the effects in the original file. Video files are compatible with most tablets, portable media players, video game consoles, and mobile phones.
+
+9\. E.M. PowerPoint Video Converter Pro
+---------------------------------------
+
+_Windows, Shareware_  
+_$45.95 for a lifetime license_
+
+![Etinysoft PowerPoint Video Converter ](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606587.png "E.M. PowerPoint Video Converter PRO")
+
+[ETinySoft PowerPoint Video Converter](http://www.effectmatrix.com/PowerPoint-Video-Converter/) claims to be an all-in-one PowerPoint to video converter, capable of converting files to almost all popular video formats. The site provides a number of tutorial videos to help users get started.
+
+10\. VeryPDF
+------------
+
+_Windows, Shareware_  
+_$19.95 for a single personal user license_
+
+![VeryPDF converter](https://www.ispringsolutions.com/blog/wp-content/uploads/editor/2020/07/ispring-blog-image-1593606694.png "VeryPDF")
+
+The converter by [VeryPDF](http://www.verypdf.com/powerpoint-to-video/index.html) will convert your PowerPoint presentation to a number of video formats and is compatible with most versions of Windows, including Vista and XP.
+
+Conclusion
+----------
+
+There are many features to compare when considering the right PowerPoint conversion software. Quality of output, support of PowerPoint effects, compatibility with modern devices, and one-click publishing are all important factors to take into consideration. If you need a reliable PowerPoint to video converter that does the job no matter how many slides and animations are in your presentation, get a [trial of iSpring River](https://www.ispringsolutions.com/ispring-river/download), the number one option on our list.

+ 32 - 0
webSite/content/news/genvideofromtxt.md

@@ -0,0 +1,32 @@
++++
+title = "From plain text to AI’s generate video "
+date = "2021-07-18T00:21:34+08:00"
+tags = ["video marketing", "text to video"]
+type = "blog"
+categories = ["marketing"]
+banner = "img/banners/banner-3.jpg"
++++
+
+### WHY THIS MATTERS IN BRIEF
+ 
+In recent years, “synthetic media” has become a general term used to describe video, image, text, and voice that computers generate. With these advances, we are about to see a major paradigm shift in media creation.
+Companies like Rosebud AI and Humen are disrupting the multimedia creation space by synthesizing videos and images, potentially saving creative agencies and studios millions of dollars in asset creation.
+Imagine not being able to program or code, but still being able to write a description or a script and have an AI create an HD image or video of it for you, this is the technology that’s now arriving.
+
+In 2016 an [Artificial Intelligence](https://www.fanaticalfuturist.com/tag/artificial-intelligence) (AI) won an award for [best short film](https://www.fanaticalfuturist.com/2016/07/eclipse-the-worlds-first-ai-produced-short-film-hits-the-screens-at-cannes/) at the Cannes Film Festival in France, in 2017 another created the world’s first AI [music album](https://www.fanaticalfuturist.com/2017/05/inside-the-sony-lab-making-the-worlds-first-ai-music-album/) for [Sony](https://www.fanaticalfuturist.com/tag/sony/), and elsewhere others began [innovating](https://www.fanaticalfuturist.com/tag/creative-machines/) and creating everything from [winter scenes](https://www.fanaticalfuturist.com/2017/12/nvidias-newest-ai-turns-sunny-streets-into-snow-filled-ones-in-real-time/) to help create better self-driving cars, to new product designs, including [clothing](https://www.fanaticalfuturist.com/2017/08/amazon-is-building-a-creative-ai-that-can-design-clothes/), [sneakers](https://www.fanaticalfuturist.com/2017/01/under-armours-new-trainers-are-inspired-by-nature-designed-by-an-ai-and-3d-printed/) and even the world’s first [self-evolving robot](https://www.fanaticalfuturist.com/2017/01/norwegian-robot-learns-to-self-evolve-and-3d-print-itself-in-the-lab/). And all these AI’s have one thing in common – they’re all “creative.”
+
+AI is getting better and better at creating what’s known as “Generative content,” in short, content, such as images, music and scripts, or, let’s face it, text, that AI’s are able to make by themselves with little or, as is more the case, no input from humans, and recent examples include photo-realistic images of [fake celebrities](https://www.fanaticalfuturist.com/2017/11/nvidias-newest-ai-is-creating-scarily-realistic-photos-of-fake-celebrities/) and an increasing number of new, other, AI composed music albums from artists such as [Amper](https://www.fanaticalfuturist.com/2017/08/the-worlds-first-ai-produced-music-album-breaks-cover/), [DeepBach](https://www.fanaticalfuturist.com/2017/01/bach-lives-again-in-ai-form/), [Magenta](https://www.fanaticalfuturist.com/2016/06/google-is-teaching-robots-to-make-music/), and [Flow Machines](https://www.fanaticalfuturist.com/tag/flow-machines/), all AI’s. Now though scientists are working on building AI’s that can create generative video. The idea is that simply by typing out a phrase AI could create a video of that scene, and scientists at [Duke University](https://www.fanaticalfuturist.com/tag/duke-university/) and [Princeton University](https://www.fanaticalfuturist.com/tag/princeton-university/), following on from [Microsoft](https://www.fanaticalfuturist.com/tag/microsoft) who recently [unveiled their own version](https://www.fanaticalfuturist.com/2018/03/microsofts-newest-ai-is-an-artist-with-an-imagination/) that does the same but just for images, have created a working model.
+
+“Video generation is intimately related to video prediction,” say the authors in their new paper. Video prediction, where AI attempts to predict what actions come next in a video, has long been a goal of many AI researchers, and for obvious reasons, security companies, but so far, other than a product preview from MIT whose AI managed to predict what happened next in a cycle race, there have been relatively few successes.
+
+Visual representations, however, especially moving ones, often contain a wide variety actions and outcomes so as a first step the researchers used a narrow range of easily defined activities, which they took from Google’s Kinetics Human Action Video Dataset, for their AI to learn from including sports, such as cycling, football, golf, hockey, jogging, sailing, swimming and water skiing. The AI then studied these clips and learnt to identify each motion, refining its neural network and refining itself all the time.
+
+With a dataset in place, the researchers then used a two step process to create the generative video. The first step was to create an AI that could generate video based on just a text description, and then came the second stage, the creation of a second “Discriminator” AI.
+
+For example, if the text input was to create a video of “biking in snow,” the first AI would produce a video and the second, the discriminator would judge it and compare it to a real video of someone biking in the snow, and any improvements or recommendations would be automatically fed back into the model so that over time the results got better and better until the generative video was indistinguishable from the real thing.
+
+While the teams work is still in its earliest stages, with the new AI only capable of creating videos that are 32 frames long and the size of a postage stamp, over time they will get longer, bigger and better quality, and as it turns out the AI is finding humans, with our bodies and our unpredictable actions, the most problems, but to get a better grasp on us flesh bags the team are now training it to understand how the human skeleton works.
+
+Beyond the obvious nightmare of fake news generation, an example of which I showed off recently during my talk on the Future of Trust in London, where another generative AI was used to create a thoroughly convincing fake Obama news clip, there could be actual use for generative video, such as using it to help train self-driving cars better by helping produce realistic road and traffic simulations, or helping athletes train better by simulating game play.
+
+Either way it’ll be a while before we see any AI produced films, but we’re now at the start of our journey, and if following AI developments has taught me one thing, it won’t be decades before we see one, it’ll be years.

+ 93 - 0
webSite/content/news/slidetovideo.md

@@ -0,0 +1,93 @@
++++
+title = "5 Easy Steps To Turn Google Slides Into An Engaging Video"
+date = "2021-07-21T00:21:34+08:00"
+tags = ["video marketing", "text to video"]
+type = "blog"
+categories = ["marketing"]
+banner = "img/banners/banner-3.jpg"
++++
+
+Easily turn your Google Slides presentations into an engaging, sharable video with Screencast-O-Matic. It only takes a few minutes to record your slides. You can add your voice narration and show your face via a webcam as part of your presentation. 
+
+You can also get really creative with your [Google Slides videos](https://screencast-o-matic.com/integrations/google-slides/) by adding a green screen to [remove your background](https://screencast-o-matic.com/greenscreen). Enhance your slides even more by mixing and matching content from multiple devices. You can add [stock photos](https://screencast-o-matic.com/stock-library) or videos, or personalize it with music.
+
+You won’t need any experience to get started. Just follow this guide to turn your Google Slides into a video in five easy steps.
+
+Below is a quick video to show you how to turn your Google Slides presentations into a video:
+
+### **1\. Launch the Free Screen Recorder**
+
+Choosing the right software to record your Google Slides presentation is essential. We recommend our screen recorder because it’s intuitive, easy to use, and affordable on any budget.
+
+The screen recorder is available on Windows, Mac, Chromebook, and Android or iOS mobile devices. Use it to create simple screen recordings or more elaborate videos.
+
+Once your Google Slides presentation is ready to go, launch the [screen recorder](https://screencast-o-matic.com/screen-recorder). If you have an account, you can simply head to your account page and click on the screen recording icon to get started. 
+
+If you don’t have a Screencast-O-Matic account, you can still access the recorder.  We recommend having an account since it enables you to save and share your recordings from your hosting account. 
+
+### **2\. Record your Google Slides presentation**
+
+![Google Slides video recording - screen recorder](https://dfjnl57l0uncv.cloudfront.net/cms-sandbox/wp-content/uploads/2019/01/02133055/Screen-Shot-2019-01-15-at-12.43.34-PM-300x169.jpg "Google Slides video recording - screen recorder")
+
+After launching the screen recorder, a transparent recorder box will appear on your screen. Drag and drop the sides of this box so that your Google Slides presentation fits inside.
+For best results, set the recorder size to 720p then size your Google Slides presentation to fit within that box. This preset size makes it easy to share your video with your audience. 
+
+In the bottom left corner, you’ll see all the controls you need to record your video:
+
+*   Select whether to record your screen, webcam, or both.
+*   See the maximum recording time available for your recording. 
+*   In addition to click and drag, you can also choose a preset window size for your recording: 420p, 720p, or full screen. We recommend 720p for the clearest image of your Google Slides or PowerPoint presentation.
+*   Use the narration option to record presentation audio while you film. Click the arrow to select which microphone to use. Click “none” to disable narration. 
+
+When you’re happy with your recording settings, click “record.” You’ll see a quick countdown, after which you’re ready to film. 
+
+**Don’t forget Green Screen:** [Green Screen](https://screencast-o-matic.com/greenscreen) is perfect for recording slideshows. It removes your webcam background, putting you directly in front of your slide deck. Enable Green Screen by clicking the magic wand icon, or find the effect in the Video Editor after you record.
+
+### **3\. Finish your recording**
+
+![](https://dfjnl57l0uncv.cloudfront.net/cms-sandbox/wp-content/uploads/2019/01/02133159/Screen-Shot-2019-01-15-at-12.44.36-PM-300x39.jpg "google-slides-recorder")Done recording? When you’re happy with your audio and video, click the blue button to stop recording, and select “done” to save your project. If you need to delete your recording and restart, click the trash icon.  
+
+![](https://dfjnl57l0uncv.cloudfront.net/cms-sandbox/wp-content/uploads/2020/12/28120735/Screen-Shot-2019-01-15-at-12.44.26-PM-300x152.jpg "Google-slides-videos")After clicking “done,” your recording will appear in a new window along with options to save and publish. If you’d like to go ahead and publish without editing your video, skip ahead to step five.  
+
+### **4\. Add effects using the Video Editor**
+
+![](https://dfjnl57l0uncv.cloudfront.net/cms-sandbox/wp-content/uploads/2019/01/02133321/Screen-Shot-2019-01-15-at-12.44.59-PM-300x174.jpg "google-slides-video-editor")
+
+You have powerful editing options with both free and paid plans. As a free plan user, you can trim the start and end of your video to remove awkward pauses. 
+
+You can also [add captions](https://screencast-o-matic.com/tutorial/upload-captions-file-for-free/) to your video to make it more accessible for hearing impaired students and those who use assistive technology.
+
+Deluxe and Premier users can also use speech-to-text, type captions manually in the Interactive Captions Editor, or use the Scripted Recordings feature to use your captions as a script while you record.
+
+If you really want to keep students engaged, you may wish to get a little creative with your editing. Deluxe and Premier users have access to an even wider range of handy video editing tools. Click “edit” to begin.
+
+A menu will appear above your video timeline with the following tools and more:
+
+**Overlays:** You can add images or additional video clips, blur out sensitive information, use an outline to emphasize certain points, add an arrow, provide additional text, or zoom/ highlight into a specific area of your recording.
+
+**Stock music:** Access an expansive stock music library to fit any mood.
+
+**Transitions:** For a professional look, add smooth transitions between each page of your slide deck.  Screencast-O-Matic has dozens of transitions to choose from.
+
+**Narrate:** If you forget to mention something or skipped over an important topic, you can use the narrate tool to record your voice narration over sections of your recording.
+
+**Green Screen:** If you used a webcam in your recording, you could remove the background so you appear on the screen in front of your slides.
+
+
+### **5\. Publish and share your Google Slides video**
+
+![](https://dfjnl57l0uncv.cloudfront.net/cms-sandbox/wp-content/uploads/2020/12/28120740/Screen-Shot-2019-01-15-at-12.47.10-PM-155x300.jpg "google-slides-publishing")
+
+There are many options for you to upload, share and publish your videos.
+
+Access one-click publishing for future recordings with a Screencast-O-Matic account.  You can also publish directly on a YouTube channel, Google Drive folder, or save your video as a file. 
+
+Deluxe users can also publish to Vimeo and Dropbox.
+
+Finally, quickly share your video directly to Facebook, Twitter, or your social media platform of choice with a URL. 
+
+**Turn presentations into engaging videos**
+
+It’s so easy to record Google Slides and create an attention-grabbing presentation design with Screencast-O-Matic.
+
+The videos you create will be more shareable and engaging than a regular slide deck. Best of all, with an arsenal of video lessons on your side, you can say goodbye to repeating yourself during class. 

+ 48 - 0
webSite/content/news/texttovideo-en-startup.md

@@ -0,0 +1,48 @@
++++
+title = "Use AI to automatically generate videos from text articles?"
+date = "2021-07-18T00:21:34+08:00"
+tags = ["video marketing", "text to video"]
+type = "blog"
+categories = ["marketing"]
+banner = "img/banners/banner-3.jpg"
++++
+
+Publishers are continuously looking for new ways to expand their reach. One sure bet? Video.
+
+Even Facebook co-founder Mark Zuckerberg recognizes this. Discussing the company’s performance last year, he told media how video remains central on the News Feed of its users. “We’re entering into a period where that’s increasingly going to be video – and we’re seeing huge growth there,” he was [quoted by _Wired_ as saying](http://www.wired.com/2015/07/mark-zuckerberg-future-immersive/).
+
+Yet not all publishers have the ability to produce video content. Taiwan-based [GliaCloud](https://www.gliacloud.com/en/) wants to offer them help.
+
+> The company uses artificial intelligence to automatically create video summaries of text articles.
+
+GliaCloud’s product, GliaStudio, uses artificial intelligence to automatically create video summaries of text articles. What it does is analyze and summarize a text story and generate a video out of the data – complete with voiceover as well as photos and video clips from its content partners and public sources.
+
+Launched in 2015, the company is the brainchild of David Chen, who is recognized as [one of 48 Google cloud developer experts](https://developers.google.com/experts/people/chien-hsun-chen) worldwide, and Dominique Tu, who has over 20 years’ experience in business development with a solid network in the advertising industry.
+
+The team’s pitch is pretty straightforward: video is more or less now a necessity because it appeals to consumers’ visual nature, but producing one is expensive and time-consuming. With GliaCloud, publishers can now create videos out of their own content, in just a few minutes, and at little cost.
+
+Publishers may choose to pay for it per use or split the ad revenue they generate from the videos with GliaCloud. The company also offers a free version with embedded advertising to individual users and shares whatever revenue it earns with them.
+
+“Large and individual publishers can utilize our patented Chinese sentimental analytics technology to easily create videos, with just a few clicks,” says company COO Agnes Peng.
+
+She adds that GliaCloud also provides publishers data analytics services to see how the videos have performed in terms of views and consumer feedback, among other metrics. “Our generated videos can enhance the click-through rate of their social media posts, bring more traffic, and lead to more profits.”
+
+Video is king
+-------------
+
+Video content gets more organic reach than any other type of post, and most online publishers heavily rely on social networking sites such as Facebook to bring traffic, notes Agnes.
+
+Citing a [report](http://syndacast.com/video-marketing-statistics-trends-2015/) by Syndacast, Agnes emphasizes the significance of video for online media.
+
+“Syndacast predicts that 74 percent of all internet traffic in 2017 will be video. Video is widely considered as one of the best marketing tools for the online advertising industry. The global market for online video ads is expected to reach US$19 billion by 2017, while the Asian market is expected to achieve US$10 billion in 2020.”
+
+That’s a huge pie that Agnes says GliaCloud will most definitely take a bite out of, given the novelty of its service. We haven’t heard of a similar offering in the region right now, but one company named [Wibbitz](http://www.wibbitz.com/) is doing the same thing in the US.
+
+In terms of quality, well, the AI-created videos are not as sophisticated as the ones created by media organizations in-house. They’re short and simple – no fancy text layouts, graphics, and transitions – but they will do, if you’re just looking for bite-sized and quick news.
+That seems only right since GliaCloud is not expected to replace media group’s video production teams. The service is positioned as a way to supplement what those teams are already doing.
+
+GliaCloud has tapped BusinessNext, one of the largest tech media companies in Taiwan, as one of its “testing partners.” Other local media outlets using the service are in the sports and entertainment sectors, though Agnes refuses to disclose specific names citing “confidentiality.”
+
+As it’s still in beta, the startup also can’t provide revenue figures or any financial information yet, says Agnes.
+
+If you want to see it for yourself, here’s a sample video produced using GliaCloud:

+ 203 - 0
webSite/content/news/texttovideo-generation.md

@@ -0,0 +1,203 @@
++++
+title = "Can artificial intelligence generate new video content from text descriptions??"
+date = "2021-07-21T00:21:34+08:00"
+tags = ["video marketing", "text to video"]
+type = "blog"
+categories = ["marketing"]
+banner = "img/banners/banner-3.jpg"
++++
+
+**This project aims to build a deep learning pipeline that takes text descriptions and generates unique video depictions of the content described.** 
+
+The crux of the project lies with the Generative Adversarial Network, a deep learning algorithm that pins two neural networks against each other in order to produce media that is unique and realistic.
+
+![Credit:  Scott Reed](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1502497882210-13WGEC0U63YKVUWONMND/basic_gan?format=1500w)
+
+Credit: [Scott Reed](https://github.com/reedscot/icml2016)
+
+This model consists of a generative network and a discriminative network. While the generator produces new content, the discriminator tries to identify the generator's work from a pool of real and fake (aka generated) media. The discriminator produces a "'real" or "fake" output label for each piece of content made by the generator. The "fake" labels are then treated as errors in the generator's back-propagation.
+
+This adversarial design has been shown to greatly outperform many generative models used in the field. As the discriminator gets better at distinguishing the computer-generated from the human-generated, the generator improves in producing more realistic media.
+
+Perhaps the two largest downsides of using Generative Adversarial Networks is that they are both hard to train and hard to evaluate. We'll discuss some techniques used to mitigate these challenges in this project.
+
+***
+
+**Dataset**
+===========
+
+For training, I used The Max-Planck Institute for Informatics (MPII) Movie Description dataset. The dataset includes short movies snippets, as well as textual depictions of what is featured in each video. The text comes from a audio description service aimed at helping visually impaired people better follow a movie. More information on the dataset can be found in [this published paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Rohrbach_A_Dataset_for_2015_CVPR_paper.pdf).
+
+I used video-description pairs from 9 romantic comedies with the aim of training my algorithm to generate videos of humans in action.
+
+![Credit:  CVPR Paper](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504598062518-YXV98L1WM8B0FUVDXO2Y/image-asset.png?format=1000w)
+
+Credit: [CVPR Paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Rohrbach_A_Dataset_for_2015_CVPR_paper.pdf)
+
+***
+
+**ML Pipeline**
+===============
+
+To recap, the goal of this project is to input text descriptions into a series of ML models that produce a video of said description as output.
+
+The overall pipeline looks something like this:
+
+*   Vectorize and embed the text into latent space
+*   Use GANs to expand the text embeddings into a series of images
+*   Convert the series into a GIF
+
+### **Embedding Text Descriptions**
+
+![Screen Shot 2017-09-04 at 11.15.53 PM.png](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504593131052-TA1UO3T9TQ2JQ9FOVGZ8/Screen+Shot+2017-09-04+at+11.15.53+PM.png?format=1500w)
+
+_Find Keras code for the Variational Autoencoder used in the project [here](https://github.com/Toni-Antonova/VAE-Text-Generation)._
+
+I first vectorized the text descriptions using Facebook's fastText word2vec. This was done by concatenating the word vectors in each sentence. The vast majority of the descriptions in the dataset are 25 words or less, so I limited the concatenated vector length to 7500 dimensions (300 dimensional word vectors \* 25 words). Descriptions that were shorter than 25 words had their vectors extended to the 7500 dimensional size with a padding of zeros.
+
+I chose to concatenate the word vectors instead of average them in an attempt to keep some of the semantic ordering in tact. Variational Autoencoders have been shown to embed entire sentences into latent representations quite well, sometimes outperforming LSTMs ([study linked here](https://arxiv.org/pdf/1511.06349.pdf)). With this in mind, I ran the description vectors through a VAE in order to reduce their dimensionality and get more meaningful embeddings.
+
+![siamese network.png](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504422753628-GS7NAGI39DDOVDK2XGAM/siamese+network.png?format=1500w)
+
+_Find Keras code for the Multimodal Embedding Network used in the project_ _[here](https://github.com/Toni-Antonova/Joint-Multimodal-Embedding)._
+
+The Variational Autoencoder works to cluster embeddings with similar semantic patterns. However, visualizing that text down the road requires a more nuanced embedding framework.
+
+Visualizations of thought tend to bring out a lot of the implicit context present in the explicit text. Descriptions of birds tend to visually elicit tree branches and bird houses. Descriptions of kicking a ball can lead us to image soccer, green grass, and shorts. So how can a model learn to pick up on the implicit meaning of a language? And is there any way to help it along the way? 
+
+[Joint Multimodal Embedding Networks](https://antonia.space/text-to-video-generation#) have been shown to provide promising results in this direction. They try to cluster lower dimensional representations of different media with similar subject-matter. I used a Siamese Network with text and image encoders to develop this type of design. The model decreases the euclidian distance between embeddings of images and their text descriptions and increases the euclidian distance between embeddings of images and unrelated text.
+
+The text encoder in the trained Siamese Network was then used to create the final latent embeddings for each video description. This encoder added several fully-connected layers on top of frozen layers from the pre-trained VAE encoder above.
+
+### **Stacking GANS**
+
+Once the descriptions are embedded into a lower dimensional space, they can be used as inputs in a General Adversarial Network.
+
+The GANs used in the project were adapted from these two papers: 
+
+[Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/pdf/1612.03242v1.pdf)
+
+[Generating Videos with Scene Dynamics](http://carlvondrick.com/tinyvideo/paper.pdf)
+
+The first GAN was trained to convert text descriptions into image depictions of the text's content. The second GAN was trained to take those generated images as input and extend them into a series of 32 frames. 
+
+![Screen Shot 2017-09-04 at 11.31.14 PM.png](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504593219951-QAJWL5Y3BPDXE4D7BVYA/Screen+Shot+2017-09-04+at+11.31.14+PM.png?format=1500w)
+
+_Find Tensorflow code for the text-to-image GAN used in the project_ _[here](https://github.com/hanzhanggit/StackGAN)._
+
+I recreated the study going from "text to photo-realistic image" with the code above. The dataset provided allowed the network to learn how to generate realistic bird images from detailed descriptions of birds.
+
+Here is a sample of my results. The text descriptions on the left were the input that produced the bird images directly to the right of them. As you can see, the images coincide with the descriptions quite well. The generated birds are also quite diverse. 
+
+![Screen Shot 2017-09-06 at 3.09.12 PM.png](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504735948987-GVASYK9Y9N7U3IC5CH5K/Screen+Shot+2017-09-06+at+3.09.12+PM.png?format=1500w)
+
+The study referenced stacked a second GAN on top of the first to continue upsampling and thereby converting the low resolution images into high-res outputs. Training individual epochs of this model took an extremely long time, even on high-tier AWS instances, so I decided to skip this phase when training on my own data. In the future, I look forward to fully implementing this step of the process with a slightly altered and hopefully quicker high-res producing GAN.
+
+Here is an example of the output of the second GAN. The images below are from the study itself.
+
+![Credit:  Han Zhang Github](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504593493089-G7S8EBCNHCFS8WGGZ4RO/Screen+Shot+2017-09-04+at+11.37.32+PM.png?format=1500w)
+
+Credit: [Han Zhang Github](https://github.com/hanzhanggit/StackGAN)
+
+### **Video GAN**
+
+The next model I ran took the images generated above as input and produced a horizontally long graphic that includes 32 sequential frames, one of which will ideally be the input image itself.
+
+![Screen Shot 2017-09-04 at 11.16.22 PM.png](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504736025989-FCAGL5HVIFNI4H957ZQA/Screen+Shot+2017-09-04+at+11.16.22+PM.png?format=1500w)
+
+_Find Torch code for the image-to-video GAN used in the project [here](https://github.com/cvondrick/videogan)._
+
+This model is able to generate videos on distinct subject matter quite well. The branching convolutional layers encourage the model to split the input image into it's foreground and background components. Typically, the majority of the movement in a video occurs in the foreground. Therefore, the model replicates the static background in each frame while combining the moving foreground into the frames using a mask. 
+
+The output image is then sliced into its competent frames and made into a GIF. 
+
+Here are examples of **train** and **beach** videos produced by the [study](https://github.com/cvondrick/videogan) itself.
+
+![train2.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594340409-GXW95YC01TISXSICUKKT/train2.gif?format=300w)
+
+![train3.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594348784-BS7M3TA5FVM031CFG00Y/train3.gif?format=300w)
+
+![train4.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594359108-JH92LI0G96WHHREE38OZ/train4.gif?format=300w)
+
+![train5.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594365537-PWLVH12UICZQ1DVDNZRY/train5.gif?format=300w)
+
+![train6.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594410640-T5FIJOW8X2XQDAK54S6Z/train6.gif?format=300w)
+
+![train1.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594428231-1BCH20SUIKLL5ZNW77RM/train1.gif?format=300w)
+
+![beach5.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594583232-0RH25J3K67RBSWXHZ5U2/beach5.gif?format=300w)
+
+![beach1.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594534682-CDMR1H8PSP8K5E6AF04H/beach1.gif?format=300w)
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594546964-V1ZNPJ80FRLWV6R9GMQK/image-asset.gif?format=300w)
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594556917-R8210M7SW2OWNUF9SCWB/image-asset.gif?format=300w)
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594573003-ZSO23JN9OWTMGCS72OFS/image-asset.gif?format=300w)
+
+![beach6.gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504594610531-LSRD5RC6GOEJWS8KRYZO/beach6.gif?format=300w)
+
+***
+
+Results
+=======
+
+Deciding when to stop training a GAN can be tricky work. You can check the content produced by the generator at different stages in the training process, but this is only helpful once relatively realistic content begins to show. What do you do beforehand? Monitoring the generator and discriminator's loss can be an additional method of evaluating the GAN.
+
+As mentioned above, a GAN's discriminator will typically begin with very low accuracy and therefore high loss. Because the GAN's generator also starts out quite horribly, the discriminator will very quickly be able to distinguish generated images from real-life ones. As the discriminator's error drops, the generator slowly begins to find ways to trick the discriminator and reduce its loss as well. Typically the generator will improve one aspect of its images at a time. This allows the discriminator to once again pick up on patterns in the generated images, label them as fake, and once again begin increasing the generator's loss.
+
+One typically hopes to stop training the GAN when the generator's and discriminator's losses begin to close in on each other and stabilize. This will ideally happen at the second local minima of the generator's loss plot. 
+
+Below is a screenshot from my loss plot midway through my training process. 
+
+As you can see the generator's loss drops initially and then begins to curve upward as the discriminator picks up on its antics. Following the 6000th training step and not visualized here, the loss began to drop again. 
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504687750132-YH2GRJI37KGIROOCA20D/image-asset.png?format=1500w)
+
+Here is a sample of images produced by the first generator. As you can see they look like various frames from snippets of a romantic comedy.
+
+You can also see that the generator repeats images at different points. Those generated frames must have fooled the discriminator better than the rest, risking stagnation in the training process.
+
+![Screen Shot 2017-09-05 at 12.03.13 AM.png](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595348049-BXUPHNL00YNQP54VWUAW/Screen+Shot+2017-09-05+at+12.03.13+AM.png?format=2500w)
+
+Here is a sample of videos produced by the second generator. The first picks up on a couple dancing or hugging, while the second seems to form a boxy humanoid form interestingly not found in the movies themselves.
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595562889-4T5OK6IC8YA2P6F14FK3/image-asset.gif?format=750w)
+
+![ezgif.com-crop (6).gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595569073-WB64L21T7824XRR0E4VR/ezgif.com-crop+%286%29.gif?format=750w)
+
+While individual photos and videos from the GANs created interesting content reminiscent of a romantic comedy, the overall meaning of text descriptions broke down as they went through the process. Here are three columns showing the initial text description, followed by the image output of the first generator, and the video output of the second generator. 
+
+While the image output may have picked up on some of the meaning in the text, the videos themselves entirely lose it. They do make for some fun psychedelic GIFs though.
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595116082-OSS3VWCOG48ON4LU845G/image-asset.png?format=1500w)
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595464719-PRC8MRSJV6HQXUC9YDON/image-asset.gif?format=300w)
+
+![ezgif.com-crop (9).gif](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595487414-UCO6LWXA2FYSSJPHGQQ8/ezgif.com-crop+%289%29.gif?format=500w)
+
+![](https://images.squarespace-cdn.com/content/v1/590fc266414fb521f04ed89b/1504595524901-N6OYITIGZTMS0LHILLZI/image-asset.gif?format=300w)
+
+***
+
+Limitations
+===========
+
+While the ML pipeline was complete, the results ultimately need some improving on. The causes of the breakdown in meaning from text description to video may not be fully remediable, but I believe that another run of the models with a more refined dataset can produce much, much better results.
+
+The dataset used, video clips from nine romantic comedies, was unfortunately too varied in subject matter and too low resolution from the start to provide meaningful results.
+
+While the consequences of this became apparent during the project, it was too late to turn back. Filtering the dataset for higher quality videos on less varied subject matter will ultimately require significantly more image editing work and perhaps the design of a classification model to aid in the process.
+
+Another large limitation of this project design was the disconnect between the initial text vector and the final video-GAN. While I did not have time to further modify the second GAN's framework, I see several potential ways to improve it - discussed below.
+
+***
+
+Future Runs
+===========
+
+I believe this pipeline will run quite well on higher quality videos that stick to a specific subject matter. I'm currently looking for available datasets and scrape-able material that will be good for this use.
+
+Additionally, I believe that tokenizing for the action words in the text descriptions and putting an additional focus on these word vectors during the embedding stage will prevent the contextual breakdown that this project faced.
+
+Changing video generation model to be more like the image generation one will also improve the results. The image generation model takes into account whether the image is a match with its text description when deriving the loss. The video generation needs a similar data and loss function design.