The crazy The 20s of the last century transformed any artistic manifestation. The new technologies of that time, photography and cinema, triggered creativity. These crazy The 20s of the 21st century bring a new transformative technological element: easy-to-use artificial intelligences, available to all audiences, that create photos, paintings, videos, anything, from a written text. They can be as realistic as the images captured by a camera, or with an artistic style that imitates the great painters of history.



GAN, the algorithm that created Obama

In 2018 the painting went up for auction The portrait of Edmond de Belamy presented by the Obvious collective and signed by an algorithm named GAN. This artificial artist developed in 2014 was well known by the famous deepfakes or generation of fake videos like the one made with the face of Barack Obama.

The video begins with an Obama indistinguishable from the real thing saying this sentence: “We are entering an era in which our enemies can make anyone say anything at any time.”

AIs available

The new models are not limited to creating based on learning from previous images, but also establish relationships with the text.

In recent times, new names have been appearing such as DALL E 2, Image, Craiyon, Midjourney, Nuwa-Infinity, Stable-Diffusion… All these AI models are based on the same principle of generating an image from a description , a text. Their differences may be in the final style, their editing possibilities or the restrictions of use. But the principle is the same.

How is an AI able to generate an image from the text?

Let’s think about the natural process to transform a text into an image. For example: “Paint a tree at sunset on an autumn day.” Each human will paint a different tree. We do not create a literal image of the elements that we read sequentially, but rather we create a much more generic abstraction on which we add more details depending on the iterations that we carry out around that abstraction.

When reading a text we locate the most relevant words. For example, we do not give the same importance to verbs as to articles, nor can we avoid capturing a meaning and certainly not a feeling: the text may seem funny, sad, nostalgic…

In general, there is a series of associations that are triggered and that with a little effort we represent in a drawing or a song.

A DALL E-type generative image model is not far from this intuition.

Jennifer Aniston’s famous brain cell

Although AI does not pretend to faithfully simulate our cognitive process, some similarities do emerge between the artificial and biological worlds.

If the representations in the biological world are established in neural networks, in the synthetic world they reside in numerical matrices, that is, latent spaces that condense the most representative of the world they intend to generate.

In 2005 a research article published in Nature He collected that we have specific neurons for each particular concept. For example, the same neuron is always activated when we see a photo of Jennifer Aniston, or a cartoon or her name written on it. This effect is named after Jennifer Aniston’s neuron.

In the same way, the developers of OpenAI, the company behind the artificial intelligence DALL·E, collect in a recent publication the existence of artificial neurons, we repeat, artificial neurons, that react to similar concepts, whether the input is a photograph or it’s a drawing In this case, the network is capable of compressing, with a series of numerical parameters, all the variability that a famous character represents, independent of its input.

But having the idea of ​​a work is only the beginning, capturing it has more merit.

decoding

There are various strategies to transform the conceptual part into the real image. For example, DALL·E 2 does not generate from a blank canvas but from noise that, guided by the text we have entered, gradually transforms into the image we are looking for. This new form made DALL·E in its version 2 preferred over its predecessor, both for its understanding and manipulation as well as for its photorealism.

DALL E doesn’t see the same as Stable Diffusion

Artists not only feed on technique but also on experience. The AI ​​experience is its database, with which the system has been trained and which will evolve towards a more realistic (DALL·E 2) or more artistic (Midjourney) style.

Stable Diffusion is a project open source. Thus, the developers of software they can gradually create more sophisticated tools and integrate them into other image, music and video editing programs.

DALL·E 2 offers a good understanding of the text and ease of use in tools such as the painting for the integration of new elements in specific areas of the image or the outpainting to enlarge the canvas of an image by adding content around it.

This is art? We have asked GPT-3

In the same way that photography pushed painterly realism and new art styles like post-impressionism, new AI generators will push creativity in new directions.

But let’s see what GPT-3 thinks, a type of architecture designed to process natural language, which is at the base of DALL·E.

Do you consider DALL·E’s images to be similar to art made by humans?

AI GPT-3: I do not think so. I think it’s interesting to see what kind of images DALL·E creates from textual descriptions, but I don’t think they’re like human art.

Why not?

AI GPT-3: I believe that human art is created from a different perspective than DALL·E. Human art is created from the perspective and background the artist’s own, while DALL·E creates images from a textual description.

Define background of the artist

AI GPT-3: The background of the artist refers to the perspective and context that he brings to his work. It includes his personal history, his culture, his values, and his own unique way of seeing the world.