Van Gogh of the computer world: an image generation program

In the previous article, we met, and someone refreshed, with how we taught the machines to understand our speech. Now it’s hard to surprise with a program or a robot that can talk to a person. But what if a person says to the robot “draw me a sunset on the seashore”? Can a robot understand this and draw? Now can. Thanks to the development of Xiaodong He (we will not translate the name so as not to distort it) and its team of researchers of artificial intelligence. How their work works and whether it is possible to open an art gallery with the works of this program, we will find out further. Go.

Briefly about the essence

If you exaggerate, the program works according to the principle “I draw, what they say to me”. You say “purple teapot with a long nose”, and the program pixel by pixel tries to depict this information as accurately as possible, in other words, to visualize your words. To do this, it is necessary, first of all, to understand what you are saying, and then to determine the most important words - vectors, on the basis of which you can make assumptions regarding the required version of the image.

Xiaodong He says so: “... If you use Bing and look for a bird, you will get a picture of a bird. But in our case, the picture is created by a computer, pixel by pixel, from scratch ... These birds may not exist in the real world - they will be an example of a computer representation of birds ... "

This project is not the first for Xiaodong He and his colleagues. Previously, they were developing the CaptionBot AI system, which automatically created descriptions for photos. SeeingAI was also a system that answers human questions about photography. According to the researchers, the latter would be very suitable for blind and visually impaired people.

The basis of the artist-on-demand project is a combination of two systems: Generative Adversarial Network (GAN) —the image generator and discriminator (discriminator) —a module critically analyzing the quality of the resulting image.

Another important component of this system is the mathematical calculation and expression of human attention. When we see an object, we pay attention, albeit subconsciously, to certain of its external characteristics. Likewise, when we are told about an object. We are told about lemons, we see oval small yellow fruits. We do not represent peaches. It happens in our brain, automatically. As for the machine, then you need to program its brain so that it works at least a little like a human. And we know that the machine understands the language of numbers best of all - mathematics, so the researchers have transformed the phenomenon of attention into mathematical formulas. And now more about each of the components of the system.

Attentional Generative Network GAN

According to the researchers, their GAN system differs from similar attention to detail. A regular GAN perceives the entire sentence (for example, a “purple teapot with a long spout”) as a single vector to be followed when rendering. In the case of the development of our characters, attention is paid to individual words that become visualization vectors for individual sections of the image. Simply put, the program does not draw the whole picture at once, but divides it into pieces (like puzzles) and draws each of them separately.

To explain in simple terms how the mathematical system works, let's imagine that our sentence (description for the image) is a formula, and words are variables.

Schematic representation of the program algorithm

Each of the words is an important vector, that is, it determines in which direction the program will think. To begin, the system must select the most significant words. The system tries to match words and separate sections of the future image. For example, a blue bird with a black beak - if we consider the word “black”, then it touches the beak - a separate section of the image.

Having determined the vector of each of the words, the program collects all the information in the form of a matrix, which it then begins to implement as an image.

As an example, the researchers propose to consider 4 requests of a different nature (picture above).

Consider the first three images of the bird. As you can see, they are very different in quality and detail. The fact is that the first frame (blurred and inaccurate) is the result of image generation when analyzing the entire sentence as a single vector. In the second frame, our bird is better seen, since the sentence was divided into separate words (vectors), which made it possible to clarify some details (for example, black eyering - black eye).

The same images are shown below, but with the selection of individual sections corresponding to certain words, which the program generates, for further comparison into a single whole. The most recent frames demonstrate exactly which words in the description of the future image the program considered most important.

This set presents the results of image generation when extracting two most significant word vectors from a sentence (black + white, red + yellow, blue + red).

presented in the image above. "Fluffy black cat floating on the surface of the lake" is almost indistinguishable, although the lake itself is depicted very well. The same situation with road signs. However, the second image came out almost accurate ("red bunk bus floating on the surface of the lake"). The only thing is not a bus, but a boat or yacht.

The results of such experimental queries confirm only that the system has yet to learn a lot. In particular, it is necessary to constantly update the knowledge base of the system. That she knew how this or that subject looks like. However, despite all the inaccuracies and flaws in the generated images, this system is amazing. The range of its application is quite wide: from helping in interior design to creating animated films solely by reading the script. Also, in conjunction with the facial recognition system, the image generation program can also be used for law enforcement agencies, for example, when compiling the identikit of the suspect.

This article describes the basic principles and the essence of the program for generating images by describing them. For a more dovish acquaintance with the mathematical component of the program's algorithm, you can download the corresponding research report.

- Is it very difficult to paint?
“It's either easy or impossible.”
(Salvador Dali)

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4 RAM).

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/409785/

Van Gogh of the computer world: an image generation program

Briefly about the essence

Attentional Generative Network GAN

More articles: