Synthetic Media in Networked Image Cultures
Lecture by Roland Meyer
While the initial hype surrounding AI image synthesis models such as Dall‑E, Midjourney, or Stable Diffusion is fading, their massive impact on networked image cultures is becoming ever clearer: Social media platforms are flooded with increasingly absurd synthetic clickbait, right-wing online accounts have become super-spreaders of AI-generated propaganda, and a pervasive distrust of digital images has popularized new forms of pseudo-forensic analysis.
Taking its cue from these current phenomena, the lecture attempts to map the emerging synthetic realities of generative AI, analyze their specific aesthetics, and trace their infrastructural conditions. Not only are the visual worlds of synthetic media made to be shared, liked and commented on via social media platforms – they are in fact a product of these very platforms, their data accumulations, filter aesthetics, reaction economies and monetization schemes.
More than anything else, AI images may thus represent platform capitalism itself, albeit in a dream-like state: a closed world composed of endlessly repeating patterns from the past, fueled by constant feedback loops and optimized for quantifiable user engagement.
Tue May 14, 2024, AdBK München, Emergent Digital Media class
Read full transcript (generated by Whisper)
So please join me in welcoming Roland Meyer. Thank you. Applause It's a pleasure and it is an honor to be the first in this series here in Munich and also very glad to see so many people, also some familiar faces here in the room, despite the wonderful weather outside. You came here to look at some quite ugly images, actually, beginning with the one you've probably already seen online, which was shared by Sam Altman, CEO of OpenAI, on September 20th last year and on the platform now called X. And what he wrote with it was a prompt that you can read here. Here, an illustration of a human heart made of translucent glass, standing on a pedestal amidst the stormy sea, rays of sunlight pierce the clouds, illuminating the heart, revealing a tiny universe within. The quote, the universe within you, is etched in bold letters. And that's what this image actually more or less shows. I find it very tempting to read this image as a kind of allegory for generative AI or at least for what people… like Altman want us to believe that generative AI is. And I think it's very obvious that this is a kind of religious image, that is an image that plays with Christian iconography, just like the sacred heart, that symbol of God's boundless love for humanity. It's also a very esoteric image that alludes to ideas that you can find in Sufi mysticism, but also in countless New Age-inspired self-help books and websites that all teach you to discover the divine principle of infinite flow within your own soul, the universe within you. And of course it's an image that plays with very classical ideas of the sublime. You can think of romantic images of the stormy sea and of shipwrecks
with a viewer, Schiffbuch mit Zuschauer, the famous Hans Blumenberg topic. So all this is combined into an allegory of boundlessness, of self-help, of self-help, of self-help, and of self-help. So all this is combined into an allegory of boundlessness, of self-help, of self-help, and of self-help. of the infinite, of the ever-flowing, and the sublime. And that's very much in line with how Silicon Valley now likes to think about what they call artificial intelligence, or maybe better, machine learning. So metaphors of endless liquidity, of boundless flows, of floods, and of the coming wave are everywhere in the current AI hype. So you can take Altman's colleague Mustafa Suleiman, CIO of Microsoft AI, and author of this widely-read new book that is tellingly called The Coming Wave. And he only recently stated that AI, for him at least, now feels like an endless river of creativity, making poetry and images and music and video that stretch the imagination. So it's this ideology of endlessness beyond any imagination that Altman's suite supposed to resonate with, I think. AI is here a sublime, natural force that is coming over us and that we can only react to in awe and in devotion, basically.
So it's quite tempting to poke fun at this kind of pseudo-religious iconography of this image and this kind of kitschy poster-thop aesthetic and see it as a symptom of the quite juvenile aesthetic sensibilities of these Silicon Valley tech bros. But in a way, that would be to disappoint. Because this image, after all, is not so much about what it represents and what it's supposed to mean, but what it's supposed to do. It is a test image. It is a demonstration, in this case, of Deli 3's capabilities, especially its ability to render legible text. And that was something which older diffusion models like Deli 2 and Midjourney had quite some problems with, in which now, this image shows us Deli 3 was able to do render legible text. And that's why it was so important for Altman to include the prompt in the tweet, to demonstrate how accurately Deli was able to visualize this quite complex description it was given. This image is meant as a rendering of the text. In fact, as this post by OpenAI shows, the text originally was a bit longer than the version Altman then retweeted, and it ended with etched and bold letters across the horizon. So as you can see, the letters are not across the horizon, they are on the pedestal, so the value 3 was not able to completely understand the prompt, nevertheless it was able to reproduce the text. Also you can ask yourself, is it really a heart made of translucent glass? Is that glass really, or is it rather a heart made of stone? I'm not quite sure, but it's a heart, at least. So in the end this kind of lengthy, complex wording of this prompt seems more like an abacadabra, kind of a magic formula and a conjuring trick that
is designed to distract you from the very fact that the model doesn't quite understand basic spatial relationships. Subtitles by the Amara.org community Subtitles by the Amara.org community testing is always a serial process. If you test something, you have to repeat and repeat the testing. So this image was part of a whole series of images that were designed to show you how Delhi is now able to translate nuance requests into extremely detailed and accurate images, as it says here in this post. The reason I've spent so much time talking about this and the other images is that for me they are a perfect example of what I've called, taking a term originally coined by my friend and colleague Jakob Bülken, platform realism. Platform realism in this conception is a second order aesthetic of generic images. Images not supposed to represent any kind of even fictional reality so much, but rather to visualize generic concepts based on patterns derived from masses of already existing and already labeled images. And optimize the image. And in this case, I'm talking about a very specific image. And I'm not talking about a very specific image, but rather a very specific image that has to cater to consumer expectations.
In other words, there is a reason these images look like they look. And these aesthetics have to do with their very infrastructural conditions. These images are not only meant to be shared on social media, but they are in a very direct sense a product of social media platforms and their infrastructural logics and the platform capitalism that is behind these images. These services and networks. So I have three chapters which will be shorter and shorter. The first is quite long, I have to admit. And then it's getting shorter. I think it will be like 40 minutes from now, hopefully. And I hope you will stay that long with me. And the first is called loops of legibility. So these images I showed you are kind of already, I think, the most recent ones. I think they are the most recent. I think they are the most recent. And they are already part of a historical series of promotional test images. And the first ones were these made to announce Delhi without a number. The first one in 2021, the now quite famous avocado arm churn. That's now rather unimpressive, but it then caused quite a stir.
And I think it still shows some of the basic premises of text-to-image synthesis. These are not so much images of objects, but visualizations, as I already said. They are images of concepts and their combinations optimized for legibility. And these images are successful in so much as they are recognizable, as they are legible, as we can recognize the avocado-ness of this image as well as the arm churn-ness of this image and read this image as a visualization of the prompt. The same goes for an image like this, which was one of the more prominent promotional images for Delhi, too. So, I think it's a very interesting image. So, now it shows, okay, it can not only combine and merge concepts like astronaut and horse, but it can also visualize certain relationships between these concepts like riding. And what's more, it can simulate the aesthetic qualities of certain visual media, just like photography in this case. So, this is an image that is not only, one can say, optimized for legibility, but also for plausibility. It is successful insofar it fulfills the visual expectation that we associate with this prompt. So Antonio Sommaini has written a very good article about text-to-image synthesis, among other things, where he points out how closely connected text and image, or in Foucault's terms, the sayable and the seeable, are in these new AI visual cultures.
In his words, what can or cannot be. So, I think it's a very interesting image. What can or cannot be said, what can or cannot be written in a prompt, determines what can or cannot be visualized and seen. And this, I think, is true for many of the very virally successful images we have seen in the last year or so. So rather than depictions of objects, people, scenes, and events, they are visualizations of sayable and writable concepts and their combination. And you all probably remember at least one or two of these images. And of course, these images are in the digital world. They are immediately legible. They are perfectly matching our expectations and associations in relation to pre-formulated verbal descriptions. And that's what makes them so virally successful in some way. In a very basic sense, text-to-image synthesis is a kind of pattern recognition in reverse. In pattern recognition, the task is always to label images, to have lots and lots and lots of images and say, okay, which of these images are images of avocados? Which of these images are images of armchairs? And now, on a conceptual level, this process is kind of reversed and you give labels and generate images that match these labels.
But the prerequisite in both cases is that these models have been trained with large quantities of already labeled images. Large quantities, for example, of avocado images. So to understand the relationship between images and text, these synthesis models rely on billions of image text. And that means they rely on the fact that in today's network image culture, every image we encounter online is already surrounded by, one could say, a cloud of text. Especially all text descriptions for the visually impaired, but also captions, editorial content, all kinds of descriptive data and metadata associated with an image file, which can be scraped from the web. So if words and images become, as has pointed out, inseparable. And if you look at the image synthesis, the main precondition for this is that they are already very closely linked in today's online visual culture. And that has to do with platforms like these, for example. Among the most important sources of publicly available training data are online platforms such as Flickr, Pinterest, Shopify, Amazon Web Services. These platforms are especially valuable resources for scraping training data because most of the images found on them are already labeled and they are already also optimized for the aesthetic expectations of online consumers.
However, the labels and the alt text found on these platforms are often not mere descriptions, but rather text optimized for search engines. As Krzysztof Buszek and Jel Thorpe have pointed out, users for Shopify, for example, that's a platform that's designed to host your own web shop, they often use alt text description to increase their Google PageRank scores. Alt text most often describe what the site owners want algorithms to see, not what they want humans to see. These text descriptions that form the basis of the training of these models are not merely descriptions, they are already part of a commercial online culture very much. Also it's not that all of the images that are available online fuel these training databases. In fact these training databases such as Lion5B, which is the most prominent one, which was used both for mid-journey and stable learning. They are highly curated, statistically curated as Jel Thorpe and Buszek say, and they are also highly filtered. And one filter mechanism is that neural networks, machine learning is used to decide whether an image and a description are similar enough. So from all the images scraped from the web, only those images that rank high in the web, are actually considered as a similarity or a matching between text and image, actually enter the database.
Which means that you have already a kind of testing of legibility. And what is this testing of legibility based on? These are models that are based basically on older training data. And this older training data is based on, again, older models, older training data, dating then back, for example, to databases like Imidata. ImageNet that were built already around 2010 with a massive work of human click workers. And these kind of models upon models upon models now in kind of feedback loops of matching images and text decide which images are actually used for training these models. But the images are not only filtered for how good they supposedly match the description, they are also filtered for how good they are. So, for example, if you have a text that is a little bit more complex, you can say, well, I'm going to use this text, I'm going to use this text, I'm going to use this text, I'm going to use this text, I'm going to use this text, I'm going to use this text, and you can see that the images are filtered for their supposed aesthetic quality. So LionAesthetics is a subset of Lion5B used for optimizing image generation models comes with an aesthetic score assigned to each image.
So to give an image an aesthetic score from one to 10, again, a machine learning system is employed that has learned from images that were ranked and scored by humans. These were images that were also scraped, for example, from certain websites where people ranked and scored photos. So based on click working, and these images scored and ranked by humans now fuel a machine learning model, which then learns to predict which images are aesthetically pleasing to human viewers, to then decide which images are entering the training database for another model that is supposed to generate new images, supposed to be aesthetically pleasing to humans. So what are these images? So what are these images? Then if you look into LionAesthetics, the images with the highest aesthetic scores are not photographic images, they are not abstract images, but they are representational images very much, and they are mostly images that have a certain atmospheric quality, that have a certain air of everyday artsiness that you could find on platforms like DeviantArt, for example, and they are also very often images of white young women with white hair, white hair, white hair, white hair, white hair, white hair, white hair, white hair, white hair, both of which are woman's hair.
Everyone thinks of them as daughters, but that means that, if you stop the youth of genetics, they can stunningly change over time, and we don't want these images or these things anymore so we want to use these images and the images of women themselves as visual nutritious plants producingленning naturally and positively living life. So then you have the 줄 and those are the energies, these are the mindset for white people, and the Indian, the Forum and that are desire for most people happening in the Indian countries. So really being aieseet has recursive algorithmization of taste. And that is even done on another level with another subset called LionPop, which was introduced last year in November, which takes the most popular concepts used in mid-journey images. So it analyzes mid-journey images for which concepts they are based on and then uses these concepts as a filter for the big 5 billion image text pairs so that you get the images that most likely match the expectations and aesthetic preferences of mid-journey users to have then a data set that can be used to optimize the next version of mid-journey, for example. So you have feedback loops and feedback loops of legibility and algorithmization of taste to filter down all the images that could possibly be used down to a very distinct subset that is then deciding which images can actually be produced.
Now, I won't talk in detail about how diffusion and reverse diffusion works, but it's important to remember that all these images that are in the databases for training, they are not present to the model in the process of image synthesis or image generation. Only the highly compressed patterns derived from these training images are what these models actually learn. And then they try to find these patterns in a field of noise. And what happens there is, again, an iterative process, a process of feedback loops, one could say, where in each step the model tries to find a certain visual pattern associated with a certain linguistic concept in a field of noise and then try to find a certain linguistic concept. So it's a very unique process. It's a very unique process. It's a very unique process. So it's a very unique process. tries to intensify these patterns in order to strengthen the match between text and image. So what you have is a testing feedback loop where images are tested against how much they already match a description. The image is tested again and again and is further optimized in every iteration. So that would be less problematic if it was only about avocados, but of course as soon as you get cultural concepts and their representation you get all kinds of problematic results.
So how, for example, does a house look like? For these models, and this is one paper that shows it very clearly, if you don't specify what you mean by house you get an American suburban house. And if you don't specify what kind of flag you want you get an American flag, and if you don't specify what kind of city you want you get a European flag. So you get a very different kind of house. And if you don't specify what kind of city you want you get a high rise city that very much looks Northern American. And that is a big topic and I think you all know about the problematic, but just to introduce it, it becomes even worse when it's about the depiction of people, especially those of non-Western origin. There are many articles that have shown this. This is one I really like by Rest of the World who, for example, showed how mid-journey represents the prompt in Indian person. Yes, you can see it's all male. It's all a woman! He was also the turban and it's an absurd cliche, of course. And these kinds of cultural biases and sexist biases and racial biases have been widely addressed.
My point would be maybe these biases are not only a bug, not only something to be fixed, but maybe they are actually a feature of these models that attracts certain kinds of consumers. And one hint would be that if you look at how, for example, mid-journey in its version 5 and is later released version 5.2 represents the prompt family you can see a very clear development so with which journey version 5 the model opted for a kind of stock photography like look and very much represented also people who could be read as asian and also had a very much higher kind of diversity in the depiction of people which was then kind of rolled back with the later versions and mid-journey again introduced whiteness as a norm of representation for prompts like family and also got back to this kind of illustrative kitschy pseudo fantastic painterly style that obviously is something that is successful for the user base of these models now that is a commercial decision it's another moment of course for the model and i think it's a very good model to look at and i think it's a great model to look at and i think it's a very good model to look at and i think it's a very good model model to look at and i think it's a very good model to look at and i think it's a very good model to look at models as you may know treat these problems differently so delhi for example if you ask delhi for a family and you do that via chat gpt it interprets your prompt and it takes your prompt and says okay i think what you want is for example a photo of a diverse family in a park so it already transforms your prompt into something
that matches the corporate standards of diversity that open ai now wants to implement and the than makes itathlon smart model now i think i think the testing means that you can conduct data edits on a senior model the product eight Lost that there doesn't mean well now the mitigation via adding something to your prompt probably. So if you ask for an image of a pope it gave you a very diverse idea of what a pope could look like. And as you may have seen that caused an online outrage especially among right wing accounts who labelled these kind of a form of woke introspection and censorship and so on because software deliberately tried to produce a result that was not matching their expectations. And you can find countless of these accounts not only making fun of that but being really outraged about this. And Google actually then shut down the model and this is not available for consumer use at the moment. What interests me about it is that it's not this kind of maybe too easy and faulty way of mitigating bias which does not address the problem but as a a the model actually corrects the user rather than the model. So it corrects the prompt rather than the underlying data structure. What interests me more is this kind of right wing reaction to that because you can see in that actually what the expectations of right wing actors towards these models are. So Elon Musk also kind of joined this discussion with tweets like this where he said these are insane racist and anti-civilisation though they're not. And what that actually means is for him the problem is the however faulty attempt to mitigate bias not the bias itself because for these people if AI is a technology
to identify, intensify and reproduce racist and sexist stereotypes, this is actually not an argument against AI that's rather an argument pro AI. And you can see that for example in tweets like this. Without censorship, AI will easily be an object of criticism. It's not an argument against the fact that it's racist and anti-civilisation. It's an argument against the fact that it's racist and anti-civilisation. It's an argument against the fact that it's racist and anti-civilisation. And so the way that this guy is trying to do this is by saying that AI is a tool to detect patterns and what does this guy mean with that? He means detecting patterns is AI will give my stereotypes, will give my racist assumptions an aura of respectability, of statistical objectivity. So in a way these right wing actors see AI as a kind of truth machine. AI, machine learning, shows the reality of racial stereotypes that in their view the left woke wants to sense. And so I think that's a good example of that. And they are happy when these models actually repudiate their stereotypes for example of democrat protesters with blue hair, then they have figured it out. So that's a very common kind of view of online right wing actors of AI as this kind of truth machine showing us the reality of stereotypes. And that's for me one of the most interesting and telling examples of this posed by the Junger Alternative Baden-Württemberg.
So the youth organization of the neo-fascist Alternative für Germany that claims that real women are right wing. And they illustrate it with an obviously not real woman but a fake synthetic image of a woman which for them nevertheless tells the truth about how women should be and how women should look like. And so the ideology is these tools visualize what everybody already is supposed to know. And that's their claim to realism. And that's especially telling in a recent interview of a social media expert working for the rice somber on the одно about how they use generative AI. So these came to illustrate with these kind of share picks the fearful atmosphere people supposedly know, experience on Christmas markets but they couldn't find any images of horrifying Christmas markets because as I have said most people tend toiciress into such a situation and they're so afraid of the war in the facilities but they love to even look around who is wrong. Because we've has positive, beautiful pictures at Christmas. That's fine, but it doesn't necessarily reflect reality. That's why we used mid-journey. So you use mid-journey to get to a reality that is not photographable, that is not really out there, but it is felt.
And that is known to be true for these people. A reality based on perceived concepts and slogans. And strategic images, which allow to easily combine concepts such as Christmas market and barbed wire, make reality bend to the ideological descriptions of it. So platform realism, that would be the thesis, shows as a world optimized for pattern recognition and legibility, a world in which everything and everyone matches a pre-formulated label, a world in which mere slogans can be transformed into visible reality. And this, as I wanted to indicate, is not at all political neutral. It's deeply ideological. So the second bit shorter part I call Prisoners of the Past. And I want us to start with this image and this exhibition that is an exhibition by Hollywood director Bennett Miller, who became famous for movies like Capote or Foxcatcher, and who in 2023 announced his first fine arts exhibition at no other place than Gagos in Gallery in New York. So Miller had been working on a documentary about so-called artificial intelligence, and he was a very good friend of mine. And he was a very good friend of mine. And he was a very good friend of mine.
And he had been in the art for almost five years, and he had exclusive early access to Dali, and was kind of an abetted artist at OpenAI, one could say. And so in this exhibition he printed images like these and other works created with Dali as limited edition pigment prints. So Miller's foray into the world of high art was kind of a new career, but it was also a kind of publicity coup for OpenAI, further fueling the hype machine behind Dali at that stage. What interests me is that Bennett, Miller chose not to present a vision of the future, but a variation of the past. So all the works in this exhibition more or less looked like photographs, looked especially like photographs around 1900, pseudo-pictorialist photographs with these kind of dreamlike motifs rendered in soft sepia tones and with a blurriness that makes the majority of details rather disappear into the bake. And I think the blurriness of these images had several effects and reasons. First, it made the glitches and artifacts that were very much prevalent in the earlier versions of Dali disappear, the kind of twisted hands and so on. Secondly, these images looked very different from the promotional images that OpenAI produced and many other online accounts distributed. They looked, in fact, artistic in a very, very conventional way.
But thirdly, and that's maybe the most interesting thing, it kind of illustrated what I would call a key paradox of so-called generative AI. To create an image of the present or even the future, AI image synthesis can only interpolate data from the past. And that means that AI image synthesis is a kind of backboard prediction. It makes plausible guesses on what could have been and therefore it's structurally not only conservative, but I would say even nostalgic. And you can find many images of people, especially coming from art and design and photography backgrounds using these generative AI tools for images, producing images like this, images that look like photographs from a different era, images that are trying to conjure up kind of memories of a past that never existed, somewhere between made-up memories and half-forgotten dreams. But what idea of the past is actually kind of implemented in these AI models? It's first and foremost the idea of a past that is readily available and exploitable as a resource. And it's an extractivist view of the past. So representations of the past, all these kind of training data images, become a field to be mined for visual patterns. And these patterns then can endlessly be recombined, thereby losing any historical or cultural specificity.
And for me, a very fitting illustration of this kind of view of history are these that I suppose you know them, time traveler selfies, which went viral more or less last year. There's a whole series of them. And they all have the same concept, the idea you can use generative AI to travel to the past. And what does travel to the past mean? It means you take the past and imagine it as something that matches the representational formats and the aesthetics of today's online culture in this case, meme culture and selfie culture combined in one image. So you turn the past into something that is readily available as something to be clicked on. And we can already witness how these kind of alternate historical timelines are spreading across the internet with artifactual images like this AI-generated selfie of Tank Man at the top of Google Image Search at one point. So again, images like this which project the representational forms of today's social media into the past are for me actually emblematic of the very problematic extractivist model of history promoted by this current AI hype. In a way, the virtual archive of past images in its supposed entirety, although it's not the entirety, is now available as a resource just waiting to be appropriated, exploited and adapted to aesthetic expectations of the present.
And what this feels to do is not only to exploit the past in a way, but to quote Grafton Tanner, to foreverize it. So in this book on foreverism, Tanner says we are past nostalgia, we are now in a stage where actually the past is not allowed to end any longer. It doesn't end, it goes endlessly. Further, foreverism maintains that the old can't be merely preserved or re-released, it must be revived, given new stories, de-aged to provide the illusion of vitality, updated, rebooted. So in many contemporary uses of generative AI, this desire for visual reanimation is merged with another desire. So the one desire is visual reanimation, is to revive the past. The other is to fill in the gaps of the archive, to fill in the gaps of the past, to kind of re-imagine the images that are supposedly lost or have never been made in the first place. And there are many projects between art, scientific communication, design and whatever, that actually try to do exactly that. And one I think especially telling one is this project Versäumte Bilder, which could be translated as missed, missing or omitted images. And that was a project where Midjourney was used to produce portraits of important but partly forgotten, not all of them, female scientists. And it's a project one could be very supportive of.
It's a project to tell the stories of remarkable women to a broader audience. Nevertheless, one could be very skeptical about the means by which this was done. So if you take, for example, this AI generated image of Lisa Meitner, what you see here is not Lisa Meitner, it's not even an individual, it's a generic fantasy bearing only a vague resemblance of the actual Lisa Meitner. And all the press articles you will find that there are not many authentic portraits of Lisa Meitner, which is simply not true. There are many images of her. So there is actually no need for this image other than you don't get an image that is so instantly Instagrammable and so instantly attractive and so instantly fit for today's social media culture like this one, from the historical Lisa Meitner. And it's no coincidence that these images were first published as far as I know on LinkedIn. So they are made as a kind of eye-catching clickbait ready for these kind of online platforms. And what they show is a kind of glamorized, foreverized version of a past catering to contemporary visual expectation. And it seems to me even more troubling as, for example, the story of Elisabeth Schiemann, who was a German botanist who resisted the Nazi regime and actively resisted the Nazi regime and actively resisted the Nazi regime and actively supported those who were persecuted, is a truly inspiring story and it needs to be told. But none of these images that you see here actually show her.
What you see here are generic visualizations of a past that never was and that has never been photographed. So not unlike this fake pictorialism of Miller, what makes these images successful and plausible is how they reproduce certain visual qualities, styles, atmospheres and moods associated with a certain idea of the past. A past of always already mediated by images, historical documents, as well as movies, TV series and computer games. Now what these images evoke, in other words, is what one could call, with Frederick Jameson, a pastness rather than the past. Not a historical past of documented events, but an atmosphere of historic plausibility based on aesthetic vibes which now can be extracted from massive of digital images and turned into endlessly reproducible patterns. This image is especially telling because Elizabeth Schumann was born in 1881, so she was 20 in the year 1901. That's not 1901. Nothing about that. It's an image of a completely imaginary past without a date and without any historical weight, so to speak. So the thesis here would be that platform realism transformed history into a series of variations of vibes extracted as legible visual patterns, both from historical documents and popular cultural fiction. So rather than being a tool for resurrecting a lost and forgotten past, AI image synthesis is mainly a technique to curate mutes and atmospheres, nameable, legible atmospheres of a generic pastness.
And this again, to come back to these kind of images, is what makes this tool especially suitable for right-wing propaganda. So if you look at contemporary neo-fascism, at its core it's deeply nostalgic. It's always striving to make America or England or Germany or whatever great again. And never tells you quite clearly in which historical moment this supposed greatness actually should have happened. And these images imagine even the future, so the year 2030, as a variation of in this case deeply racist and reactionary past. And that's no coincidence, but that actually corresponds very much to the inherent logic of these AI synthesis models and how they work. So platform realism presents us with a world of foreverized pastness. The present, past and future can only be imagined as interpolations and variations of already existing images. Now I come, and I think that is five minutes or something like that, to a last chapter, which is really shorter. And I start with an image which, I don't know, may look to you like it's AI generated, like a mid-journalist, but it's actually not. And that's kind of the point. This is part of a series of images that formed a TV spot aired during Super Bowl this year, so the kind of most expensive slot in American television. And it was a spot that was paid for by an evangelical lobby group.
It was called He Gets Us, and the idea was to give images of people following in the footsteps of Jesus watching each other's feet. So people were very kind of outraged about these images, and not so much about their content, but rather about their impression that these images were AI generated, and these were cheap mid-journey images presented to them during Super Bowl commercial time, where the kind of most prestigious commercials otherwise are aired. And actually the artist who was hired to do these kind of images then posted on Instagram videos of the shootings of these photos showing how much production value actually was put into these images, how elaborate the staging of the photographic set was, how many people were involved, and how expensive these images were, which I said again, for many people looked like the cheapest images you can imagine, images produced by mid-journey. So the most expensive and the cheapest images today look almost the same, and that is, that means something I guess, and it means something for the question of the value of the image. And that's what interests me and what I'm still kind of thinking about. And one can say that the image seems completely disentangled from its production, and that obviously relates to many descriptions of how contemporary platform capitalism works, for example, this recent book by Anna Kornbluh, who talks a lot about the flow as a central both aesthetic and economic aspect of today's online culture, and about circulation, pointing out that circulation and not production is the main source of value, and that endless flows of images are mainly produced for the purpose of being circulated, rather than being merely consumed. Too late capitalism, that's a word for our present, I think a quite good one, too late
capitalism, circulation, centricity, expires, a historically contingent inflation of the image. And I think that this is very much lends itself to this kind of inflation of the image. And as you can see, there's an example like this, which are, if you're on Facebook, you might have stumbled upon, because images like these and similar ones are flooding Facebook since a few months for specific reasons. And every account who shares these kind of images, for example, Malaya Golden, has hundreds and hundreds and hundreds of these kind of images, all produced, probably in the mid-year early early early 1950s, because internet servants not cream to the lab, and real estate agencies may cloud our things, and in many cases, digital Y Не ' happens is that the operators of these accounts are ultimately interested in redirecting users via links from Facebook to other sites where they then either see masses of advertisement or they even try to obtain their personal data, so it's a phishing site then. But basically before that these images are meant to attract attention and to attract interaction and reactions and the more interaction and reaction they get the easier they flow into the feeds and timelines of people who have no connection to this account whatsoever, have not subscribed to these accounts and groups but are fed these images that are being interacted with. And that is of course artificially stimulated by bots but it wouldn't work if people wouldn't actually interact with these kind of images. And there are different kinds of interactions. There are interactions between people who are interested in the content of the feed and the content of the feed itself. And there are different kinds of and even genres of click bait that have been emerging in the last months especially
on Facebook. And a particularly successful one works with this kind of tagline, made this with my own hands, rate my work please. And there are lots and lots of pseudo photographic images of people, very often also children presenting their self-baked cookies or their sculptures are made or these intricate chairs or whatever. And asking for approval and asking for admiration of their skills actually. And that is an obvious way of soliciting reactions and then some people fall for it and express their praise in the comments. And with this religious image there's also a lot of reaction but most of the reaction is simply amen. So there seems to be a whole kind of culture on Facebook of people reacting to religious and pseudo religious content. What is the reaction? What's happening here is that again there is public emotion and the interest and demand is very clear. There is a verbatim projectile re-stimulation for people who have technology in the world, for example through the internet. There's some talk about possible linguisticцевties. What are actually happening is that you know civilization ну interested in such. And of course in an informal way you're asking for the Valentino model if you reach Stop at because it costs almost nothing.
These images are almost worthless in production and their value only is in effect of interaction and circulation. And apparently, stream G is a product of this kind of stochastic logic of almost random recombination of successful patterns. And that would be kind of my final thesis in which it links back to the beginning that these images are also kind of test images. And in a way, every AI image is a kind of test image. It's an image made to be evaluated, scored and ranked both by humans and machines. Humans acting and reacting in quantifiable, machine readable formats and machines trained to predict human reactions and interactions with images. So you have kind of two main aspects of the value of the image today in network image cultures. One is extractivist and one could say is psychotechnical. Either images and masses of images are amassed and used as source of patterns that then feed into the generation of endless flows of other images. Or end images are used as a kind of stimulus that elicits responses, generates clicks, commands and reactions. So images are either resources to be exploited or tests to be performed or both. And this kind of glossy aesthetics of these image and their kitschy symbolism and so on is a product of these.
And it's mainly only the coverage, I would say. So that's what I wanted to share with you. Thank you for your listening and yeah, interested in the discussion. Thank you so much. Yeah, thanks a lot, Hefron, for this amazing and wonderful talk. I was particularly struck by the very beginning where you describe basically a situation in which, in which platform realism sort of allegorizes ideas. It's not about realism. I'm not sure whether platform realism is about ideas, is about making ideas visible. And for the rest of the talk, I was completely depressed about the thought that, you know, finally the mystery of Plato's cave is solved, right? I mean, you remember this tale of Plato's cave, people sitting in the cave, they only see the shadows of the idea which are the reality outside. But now we are outside and the first guy we encounter is from Jesus, right? And we see the ridiculousness and the tragedy of human ideas being staged, you know, in an almost serial manner. You could call it in a manner of almost, I don't know, I have all these basically description, generative revisionism in the relation to history. Optimized populism, automated populism, and so on and so on.
So that's basically the nature of the ideas we are faced with in that respect. I think that platform realism is great because it shows us that these ideas are ridiculous, are hopelessly tragic in a way. They are ridiculous and tragic. And that's also something that really struck me at the beginning of your lecture, when you were talking, well, maybe it's even some Altman's word, I don't remember exactly, about this being an allegory, right? That's my interpretation. Okay, but I think it's a really great interpretation because I was reminded of something really old fashioned, namely Benjamin's text about the , the origin of the German tragic, I think it's called in English. And he says that, you know, he talks about the allegory being the main motive of the German tragedy. Yeah. The German Trauerspiel. And he notices that there is a kind of paradox because the Trauerspiel, it's a historical form of theater tragedy, is not really very sad. It's rather ludicrous. It's definitely ridiculous. And he wonders how can something which is completely ridiculous be called tragic? And he comes up with a really great explanation. It's tragic because it totally, totally misses the point, right?
The allegory doesn't catch anything of what it wants to be describing. It just misses the boat completely. And I think that's also the point with the allegories and platform realism, that they are tragic because they completely miss the point. And which point are they missing? They completely invisibilize the whole infrastructure they are being based on, right? I mean, all the, we call the hundred idea, examples, et cetera, you were talking about. There wasn't a single example which was concerned with image production or factory or infrastructure or digital technology, anything like that. Only this kind of weird, fake, blurry, hazy illustration aesthetics. So I think that's really the mismatch. And that's what makes it so ridiculous that it misses the point to talk about its own infrastructure. And if you want to respond to that, you're welcome. No, that's great. And I'm very thankful for this way of thinking further along these little hints I gave. And I think, yeah, I mean, I used allegory especially for the excessive nature of this image because it's not just one image of boundlessness and the sublime, but all in one put together, right? So that was one of them.
It's more than you need, and less to make a point. It doesn't make a point, but it makes that again and again and again in a way. So that would be the rhetorical strategy of an image like this, if it's worth to be interpreted. The other thing about the invisibility of both of label and the digital infrastructure. Yeah, that's actually true. And then it's even more striking and interesting that you get these images of, of handicraft, of craftsmanship as completely simulated, but as something only synthesized to generate virtual, digital, quantifiable reactions. So there is a strange, and that relates to the question of nostalgia, I would say there's a strange shifting away from any idea of what could be, what could be the specificness of these images into simulating older techniques, which is, I mean, has been done again and again in media history, the new media kind of try to simulate or adopt or take on as a content older media. But these remediations are even driven further here. And yeah, infrastructure is invisible, but it's, and I think that's the interesting thing about the digital infrastructure. I think the digital infrastructure is very much based on social media.
I mean, again, social media, you interact with the interface and you don't see the label and you don't see the click working and you don't see the content management. But these images are very much dependent on these kind of interface. And I thought it always very telling that mid journey, for example, you interact via a discord, via a social media platform. Right? And you interact with the platform that already gives you all of these kind of tools of ranking and scoring images. And that is very much part of the image making process already. So in a way, they at least don't hide the media infrastructure so much, although it's not visible in the image itself, but it's all over the interfaces. You interact with these images, I would say. Yes, question. Okay. Thank you. This is such a fascinating, fascinating presentation, Donald. Now, I also see your entire project, not just the snippets, so great. So I have a question about, I'm trying to see a sort of chronology of media production theory. And when you started with the definition of platform realism, we see the traditional model of media production where there is a producer that is trying to cater to consumers, the classic model of media capitalism, which later transitioned into integration, interaction related, circulation related, value generation with social media, etc.
And now it appears from your analysis that several strands of these media capitalist value systems are collapsing. A sort of image driven, bombastic reality. Would you see it this way or would you see the chronology differently? Would I see it as a collapse? Of which strand? I mean, I'm still trying to figure that out. Just yesterday, I was asked, even the question of testing and every image being a test image, isn't that very much the case already in all of culture industry that already relied on audience reactions and statistical analysis? And what we now have is an enormous scaling up. And intensifying and becoming faster and faster of these processes. But the logic behind that is maybe not completely new. Nevertheless, I think the scale does make a difference. And that's why these kind of these scam and spam accounts fascinate me so much because this is not even content production anymore. I mean, you have already with, I would say, so-called user generated content, you have enormous overproduction and an enormous kind of competition of images against each other and a logic that does not invest so much in the production, but generates value out of the circulation and out of the marketplaces for user generated content.
But now, it's not even users behind that. So that's another, is it a collapse or is it another stage? I'm not sure. I think that would be to find out. But I take your question as, for me, an open question. Thank you. . No, I mean, obviously, I chose the title with Benjamin and Susan Buckmore kind of in my head. And then, as you referred, it didn't really get into the talk because I'm still also very, I have to say, very much a work in progress. I think what I wanted to hint on that these images are, although they make something invisible, they also, in the masses of them, and that's also why I wanted to show that many of them, they make something visible and that something that is very inherent to digital online culture or contemporary platform capitalism, these ideas of images as a resource to be appropriated, the certain view of history as something that can be extracted, even colonized, the idea of, of, of, of images being perfectly legible and ready for, for pattern recognition, not only by machines, but increasingly also for humans who have to react in very short times and to identify kind of the, the written or legible content in these images.
All these, these kind of aspects that are, I would say, older than this younger wave of image synthesis, now for me, maybe collapse, or synthesize in these kind of floods of images that at once seem so absurd, fantastical, also boring and redundant, but, and make a lot invisible, but also can be read in some respects, and that what I wanted to try as, as kind of symptoms. At least if you, if you've tried to identify, if you've tried to identify the desires and wishes that are behind these kind of image production. For example, the desire and wish to fill in the gaps of the archive to find images that have not been produced, or they, they could have been produced in some imaginary hypothetical interpolated past, for example. So yeah, the idea of, of what, what would, what would be the role of theory? It would be, of course, an analytical one. And, um, I'm still refraining from being too psychoanalytical about that, but it would be trying to, to read these images as symptoms, I guess. I think, I think it's, there, there is a logical conclusion of how to answer your question about the role of the theorist, if you follow your logic, which is that, you know, in this kind of image circulation, the production is already vastly automated, you know, almost entirely automated.
You can imagine, you know, you can imagine that the consumption is also partly automated, right? I mean, a lot of lots, et cetera, et cetera. So the next step is, of course, to automate the theorist, you know, and to have a fully automated system that shows that the desire, I mean, the intrinsic desire of the system is basically to be fully automated and completely independent of input, you know? Do you think so? Do you think, I mean, I mean, I mean, in the end, you always need, at some points, people and you need not only people doing all that, that kind of content moderation and optimizing these models and so on, but also what are these images for if only bots interact with them, then the bots can't be phished, the bots can't be, can't produce value on their own. So they're always, it is, the question is how, how are people kind of integrated? How are they integrated in these almost automated kind of flows and feedback and circuits of circulation? No, I mean, the whole apparatus of image production is not cheap at all. And of course, um, um, um, every, every single image is, um, is, is a waste of resources.
And if I had another, I would also talk about that question. How wasteful is this kind of process and what, because most of these images are waste and not in, not only the, because to have these masses of images, uh, on, on Facebook side, there are masses of images that are only generated and never seen. So, right? So it's even, even worse than that in the whole, the whole production process in itself is, because, because there are so many images are never upscaled, never reacted to. Um, they are instantly disposable. And of course, everyone, every single of them, um, needs resources. Um, and even more, the production and training of these models needs lots and lots of resources. Nevertheless, for the, the economy of those, for example, running these kind of scams, um, the costs are, uh, negligible. So that, that is, um, an economical and an ecological problem, uh, that has to do, um, with, uh, with the, with the real costs, uh, of, of this whole system of production, obviously, uh, outsourced and, um, externalized. It was also a thought that was running through my head in this super bowl sequence. And I thought that actually, oh, this is great because this is showing the real cost of a mid journey generated image.
I mean, not, not the replica cinematic reality, but it shows the, you know, huge investment that corporations are making into funding seemingly for free compute, et cetera, et cetera. So in a way it's basically showing for once. Uh, I have a question which is very, I really wonder what you think about this, uh, shift recent, not recent, but over the last year shift. Into illustration from attempt to a quest for photo realism into this kind of weird illustration, nostalgic illustration. Let's call it like this. Yeah. I mean, it has been there all along, especially in mid journey, mid journey kind of began with images that, um, look very much like deviant art, fantasy illustrations, and very, very perfect at a very limited aesthetic range. Um, and Delhi started with this kind of, um, tendency towards stock photography look, and then they, they changed in every step and adopted to certain, uh, expectation. I think with open AI, there, and, and Delhi three, there was, I, it is my view, the main motivation, um, yeah, uh, uh, uh, limiting of, of risks and a, a policy of, of not being, um, accused of producing, um, deep fake and fake news and disinformations, especially in a year of American elections.
So, uh, at least in my experience, it is quite hard to get, um, Delhi now to produce an image that looks like a journalistic photograph. For example, it at best looks like a very hot stylized, uh, stock photography, advertisement image, but lots and lots of filters, and it has a certain glow and certain aesthetics and looks. It doesn't make you think that it is, that this is, um, um, a documentary photo, for example. And I think that is intentional to, um, um, stay out of trouble for, because all men wants, uh, doesn't want to be, uh, the one, um, in the end, uh, who, uh, is responsible for any, any interference in election, for example. I think that, that might, might be a bigger, bigger issue there. It's not so much with Midjourney, and I think they are much more relentless and much more, um, also it's a much smaller firm. But I also see it all over other platforms. Yeah. For example, I don't know, Runway and Elk, they always showed only photo realistic imagery as example footage, and now it's all animation, you know? So it's like a sudden transition. I think, but what I find even more interesting, this is more and more harder to, um, to differentiate between, because these quasi-photographic images are so aesthetically overproduced that they already look kind of generated.
And especially in AI video, uh, by Zora and so on, you get this, um, images that are supposed to be highly realistic, but look like, uh, computer games at the same time. So that, that, that, I think that has also to do with, with certain visual expectations of realism that are very much fueled by a visual culture of gaming, for example. Uh, so people, um, training these models, people, uh, involved in the production, I think their, their aesthetic values come from, um, come from gaming actually. And that, that, uh, also defines, uh, what, what, what counts as realism in, uh, in some respects. Yes. Very good question. Do you draw a distinction between synthetic media and deepfakes? Um, I think synthetic media would be for me a very broad category that, uh, includes, uh, I think almost all what, what I showed here. Uh, and deepfake, uh, for me would be depend and on the supposed intention of those spreading that. I mean, it's, if it's a deepfake, uh, the idea would be, um, uh, or the intention would, would be to disinform, right? Or there would be at, at least a supposed, um, claim that this is actually a real, uh, shows a, shows a real event or, uh, actual, uh, so it's, it's, uh, deepfake for me more depends on, on the usage of images rather than their technical way of production.
But that's my kind of preliminary personal way of, and I don't often use the term deepfake. Um, because it's, I, I think, uh, from me would be reserved to very, very kind of specific online, online practices rather than this kind of broad visual culture of synthetic media that we now see. But that's a kind of personal decision. . . . . . . . . . . . Now, I have to admit that I, I haven't done any research in that regard myself, and I'm more of a, a kind of, as a, as a coming from media cultural studies more of, observer. There is a great research, was also sighted especially on the, um, uh, statistical curation of, uh, Lion5b, which is this dataset which is, more or less, for different reason, less, uh, accessible, and that is done in a context of, um, um, Kate Cortes research, um, Kate Crawford's group, and that is, just recently, I think a few weeks ago, it was published, Models All the Way Down by Bidja Thorpe, and I always forgot his name, but I can find, so there is some research being done in datasets as far as they are actually accessible, but for a lot of these models, it's obviously not possible, and also the ways in which the fine-tuning is happening, I don't know, one would have to do ethnographic research really in the laboratories of these companies, which I guess is not that easy to do, but it would be worthwhile, I guess, yeah.
Thanks a lot. Yeah, thank you. Thank you. Thank you.