From Prompt to Image: The Literary Implications of Text-to-Image AI

Text-to-image AI is transforming writing into a visual medium, expanding literature through interpretatoin, writes Ana María Caballero.

Listen to this Article

Main Image:
Scylla
, Ana María Caballero (2026).
Courtesy of the artist.

Text-to-image artificial intelligence is often described as a breakthrough in visual creativity. But at its core, it’s a shift in language. Generative AI systems do not begin with images – they begin with words. And, in doing so, they are expanding what it means to write.

As prompts become images, writing becomes something more than text on a page. It becomes a way of generating visual worlds that are material and rooted in collectivity. This has significant implications for literature, where writers have always depended on readers’ personal image banks – their imagination – to conjure visuals from poetic devices.

I began my creative career as a writer, publishing multiple books before extending my practice into visual and digital forms. From the outset, my intention was clear: verse deserves to exist in the same plane as visual art – galleries, museums, and public installations – and I set out to position it there. The arrival of generative AI did not mark a departure from writing, but a continuation of it through another medium.

My interest has never been in AI-generated poetry. What’s fascinating is not the idea of literate statistical models, but language itself as an instrument for generating images. In this sense, text-to-image AI represents a linguistic evolution – one we’re only beginning to articulate.

Prompting is a literary act. It requires precision, intuition, and a sensitivity to tone and association. Like poetry, it depends on selecting the right words and on revision. Prompts must be tested, adjusted, rewritten, and even redacted. Words are swapped, syntax is tightened, tone is recalibrated. Language must often be bent – sometimes forcefully – to produce a desired image.

And yet, even at its most precise, language has limits. Though it’s a system built on discrete unit – words, syntax, structure – it’s open to personal interpretation, just like images. This tension reveals something important: prompting exists between two different systems of meaning – one textual, one visual.

Pioneering generative artist Mario Klingemann, known for his work with neural networks and projects such as Botto, has long explored this boundary between human intention and machine interpretation. He notes: “As expressive and versatile as natural language seems to be, I find it often too ‘quantized’ to carry the full weight of my creative intent—so I prefer to speak to machines in their native tongue, navigating latent space directly through tensors and embedding vectors, where meaning is more fluid and continuous rather than discrete.”

If language is not enough, then prompting becomes something hybrid – part literary, part computational – where meaning is shaped both by what we say and by how systems interpret it.

Equally important is selection. Generative systems produce variation; the artist defines meaning through choice, often rooting subsequent images in one that serves as an anchor. The act of selecting an output – deciding which image holds – is inseparable from authorship. Writing does not end with the prompt; it continues through curation.

The nuances between language, interpretation, and selection are central to my projects Paperwork and Speech Patterns, where I incorporated individual audience responses to my performed poetry into a system where human readership produces the art. At various readings, attendees were invited to write down a single word – a distilled moment of connection to verse. These words became central to the inputs through which I summoned digital paper sculptures from latent space.

Angustia” from Paperwork, Ana María Caballero (2023).
Courtesy of the artist.

Writers have always created visuals inside readers’ minds. With both Paperwork and Speech Patterns, such images began to take form outside the mind – fixed, but never singular. Each word functioned as a fragment of interpretation – a reading made visible. In transforming these fragments into visual form, each response unfolded into something tactile, spatial, shared. The generated sculptures are a convergence of multiple possible readings, shaped by language and by the vast visual memories embedded within AI systems.

This process is part and parcel to what I think of as analog generativity: the notion that readers and writers construct each other in an endlessly procreative process. Different words mean different things to different people. Text-to-image AI makes this process visible, rendering interpretation as output and turning something ephemeral into a shareable record.

The final work becomes less about a singular output and more about the process of its creation – an approach echoed in Fabiola Larios’s Wild Wired World, an AI-driven film that traces the evolution of the internet from its early utopian promise to its current entanglement with surveillance and data extraction. Drawing on vast digital archives, the work assembles a layered visual narrative that reflects how collective memory is not only stored but continuously rewritten. Reflecting on the omnipresence of these systems, Larios notes, “After my mom told me God was always watching me, in my teenage years I thought about surveillance as omnipresent and always watching me, like God.”

Wild Wired World, Fabiola Larios (2023). Installation view from the Perez Art Museum.
Courtesy of the artist and Karli Evans.

In a project titled Being Borges, I approach prompting as a form of translation. I engage with The Book of Imaginary Beings by Jorge Luis Borges and Margarita Guerrero, translating text into image across multiple layers. I begin with the original Spanish, generating a suite of images through prompting, then repeating the process using the English translation Borges himself supervised. Finally, I write my own poem – not a translation, but a recasting of the original through the lens of my own poetic interests.

These layers – Spanish, English, personal – are assembled alongside the images they produce. The compositions function as both translation and self-portrait. Like in Sophie Calle’s Suite Venitienne, when we describe our approach, we describe ourselves.

The project highlights both the potential and the limits of AI. Different languages produce different images because they’re different systems of meaning. The gap between them isn’t an error – it’s the point, revealing the impossibility of translation. Indeed, Borges himself famously said: “The original is unfaithful to the translation.”

Images, too, are unstable – they shift according to context, memory and the viewer’s gaze. In moving from text to image, we do not move from multiplicity to fixity, but from one kind of instability to another.

In Being Borges, Borges’s descriptions become prompts, but the resulting images are not definitive. They’re placed alongside the texts that generated them, allowing meaning to emerge in the space between.

Projects like Being Borges invite viewers to also face their relationships with text vis à vis images. Does a description of a Chinese unicorn allow our minds to wonder? Do we have a fixed notion of it? Or does the AI image – the translation – close doors within our imagination, anchoring us to fixed notions of what this creature might look like? Perhaps words are more open-ended than we allow.

Generative AI also opens new possibilities for rewriting narratives. Artist Danielle King’s project Mother examines how generative AI interprets and reconstructs motherhood, using transgressive literary texts on the subject to probe both the system and its deeply ingrained biases. By juxtaposing these narratives against the model’s outputs, she reveals how such biases are not only reflected in language but embedded within the system itself, testing the limits of how motherhood can be represented and reimagined.

King reflects: “Writers like Rachel Cusk, Anne Lamott, Olga Ravn and Kate Baer all write about motherhood with an unflinching honesty that our society routinely suppresses exposing the ambivalence and contradictions, detailing the body’s transformation and scars, describing the sacred as well as the grotesque. Their work helped shape the language of my prompts, as I asked the machine to see what culture has trained it not to.”

This aligns with literature’s long-standing role in questioning dominant narratives. What changes with AI is the speed – and the fact that this rewriting can now occur visually as well as textually.

At the same time, these tools allow for preservation. In Pace, I reconstructed a pharmacy in Madrid tied to personal memory, preserving it through AI after its physical disappearance. The result is neither document nor fiction, but a narrative that holds both memory and dream.

Pace, still image, Ana María Caballero (2025).
Courtesy of the artist.

This impulse toward preservation is not isolated. Projects such as Holly Herndon and Mat Dryhurst’s The Call, developed with the Serpentine Galleries, similarly explore how AI can record and reanimate cultural memory – training models on collective vocal traditions to ensure they persist and evolve. As they note, the work is rooted in a desire to “make the full process of generative AI legible,” transforming what is often abstract into something “visceral and emotional” while foregrounding the human experience embedded in training data.

What emerges is not a merging of disciplines, but an expansion of literature, giving rise to writing aimed less with description and more with activation.

And yet, for all its novelty, this shift returns to something familiar. Literature already operates at the edge of reality. Text-to-image AI does not replace this – it sharpens it.

 

© IE Insights.