+++ title = "About Global AI Media Alliance " date = "2023-02-15T00:55:11+08:00" tags = ["Artbreeder"] categories = ["technology"] url="/blog/about-gaima-global-ai-media-alliance" banner = "https://cmm.ai/ces2023vid/img/banner.jpg" +++ About Global AI Media Alliance Global AI Media Alliance (GAIMA) is established through mutual trust among its founding members. In the future, as the world starts to rely on AI, there will be massive and quick changes to cross-platform content creation, media production and broadcasting, and business models. GAIMA aims to link businesses and industries involved with AI media, including: AI content application technologies Media production Brand marketing GAIMA will focus on AI production broadcasting, media content, and brand marketing, which includes: Building a quick, frequent AI media industry interaction platform Promoting reliable AI media content and services Organizing AI-related events AI media covers several topics, including text-to-image and text-to-video technologies. Text-To-Image Text-to-image is a tool which generates an image based on text input. Before the rise in machine learning, text-to-image tools started as a tool which arranges existing, related images from text input to create a collage.[1][2] Machine learning-based text-to-image models have been developed since the mid-2010s, but it gained prominence in 2022 for its outputs that are similar to real-life human art. Examples of text-to-image models are OpenAI's DALL-E 2, Google Brain's Imagen and StabilityAI's Stable Diffusion. There are many text-to-image model architectures that have been developed over the years, but they are generally composed of: Text encoding Text encoding is usually done through transformer models. Alternatively, it can also be done using a recurrent neural network such as a long short-term memory (LSTM) network. Image generation The image generation step generally uses conditional generative adversarial networks, but diffusion models have also become a popular method recently. Rather than directly training a model to output a high-resolution image based on text input, a popular technique is to train a model to generate low-resolution images, and use auxiliary deep learning models to upscale the image and fill in finer details in the image. Text-to-image models are trained on large datasets of text-image pairs that are often scraped from the web. Datasets commonly used are: COCO (Common Objects in Context) The COCO dataset was released by Microsoft in 2014, and consists of around 123,000 images representing a diversity of objects. Each image has five captions that are generated by human annotators. Oxford-120 Flowers and CUB-200 Birds These are smaller datasets of around 10,000 images each, with topics limited to flowers or birds.[3] In 2022, Google Brain used their Imagen model and reported positive results from using a large language model trained separately on a text-only corpus, with its weights subsequently frozen. This method marks a departure from the standard approach in text-to-image model training.[4]
By <a href="//commons.wikimedia.org/wiki/User:The_Original_Benny_C" title="User:The Original Benny C">The Original Benny C</a> - <span class="int-own-work" lang="en">Own work</span>, CC BY-SA 4.0, Link