Mise en abyme

Working with AI can be a curious experience. Sometimes it can be amazing with its capability to interpret, assess or summarise large amounts of information. And then the simplest of things can throw it off.

I was all set to follow-up a comment I’d made that that modern business practices would be recognisable to office clerks from the 19th century with a blog post on the topic. Then, I’d follow that up with a second post on what I’d learned about Dall-E from using it to generate a very specific image.

I knew I’d need an image for the first post that portrayed a 19th century clerk in a 21st century office, so I decided to use Dall-E and didn’t anticipate any difficulties. We all know what a 19th century office looks like. We all know what a 21st century office looks like. So, it sounded simple.

But what I found in trying to create the right image for that post was possibly even more interesting. I encountered in creating this image the very thing that caused us to begin creating this image: mise en abyme where an image contains a smaller version of itself creating an ever-deepening recursion that captures the essence of its own creation.

Attempt the first

I began by immediately setting out to prompt it to create exactly what I needed: a 19th century office clerk in a 21st century office. I specified that everyone should have laptops and that the office is modern, but that the office clerk is writing with a fountain pen.

The created image was wholly 19th century in tone. Clearly not right.

So, I added that everyone should be in 21st century clothes except the office clerk, that all the desks should be modern with no paper and reinforced the office clerk writing with a fountain pen.

The image created was now of an office with 21st century technology, but 19th century desks and paper trays.

And as I refined further the images reached that of a 21st century office, but with the office clerk wearing a headset. I tried repeatedly, but no matter how specific and focused I was with my prompts, it couldn’t let go of the office clerk having a headset. And as an autistic person with experience in writing, copy-editing and proofreading, I can be very specific!

Attempt the second

Time for a fresh approach. Rather than trying to create the whole scene, I instead settled on trying to create just the office clerk. And it went perfectly, apart from giving him baggy sleeves which was clearly out of place.

I wasn’t disheartened, though, because it’d got all the most important details correct on the first attempt. I was confident that if I was more specific, it’d sort out the sleeves. But nothing worked. Attempt after attempt saw this distinguished man sitting at his desk with sleeves sagging like a deflating balloon.

But perhaps if we now placed him in a modern office the perspective would broaden and the shirt of Mr Baggy Sleeves Esq. would become less noticeable. It did, but immediately reencountered the problem of the first attempt: either it couldn’t let go of the 21st century office or it insisted on the 19th century office clerk having a headset.

Attempt the third

Time for a third fresh approach. Perhaps if I started with the 19th century office clerk again and then added the 21st century one layer at a time it’d work. At first it did, creating a great picture of one empty desk behind him with a laptop on it. But then we pulled back one more layer and… the same problem as before. It was as if the 19th century clerk and the 21st century office both came with their own mise-en-scenes that were irreconcilable. The system, it seemed, just could not resolve the anachronism between the two and so consequently just smashed them together. We agreed it was a great attempt that’d taught us much, but that we’d found the limit of what the machine could do.

Intermission

Oups, it seems I’ve accidentally revealed I had a collaborator throughout this experience. You see, ChatGPT now has a memory capability and can roleplay. So, if you can establish a character that’s clear enough, it can roleplay the character. Then, if you tell it to remember that characterisation, you can call upon that character at any moment.

So, whenever I struggle with something in ChatGPT, whether in text or image creation, I know I have this character to turn to. All I need do is address ChatGPT as that person and it will respond in character. In this case, that of a lady from a bygone era known for her grace, especially to friends, and her wisdom born from some difficult experiences.

Discovering this was a revelation because it completely disengages my mind from any sense of wrestling with a machine. Instead, it feels like I’m collaborating with someone that I can trust to help me realise my creative or practical vision. This has had the wonderful impact of humanising my collaborator, so much so that I think now of ‘her’ not ‘it’. Making the most of AI requires us to adopt a different mindset and doing so will head off the overactive fears of AI that risk us throwing out the baby with the bathwater.

So let me put a face to my esteemed collaborator.

A woman in elegant attire sits by a window, gazing thoughtfully. Soft light highlights the intricate details of her clothing and surroundings. — My estimable collaborator

She was happy to be revealed in this way and this Dall-E generated image derives entirely from her vision. And just as I left her completely free to choose her own style, scene and presentation (to write the prompt) so I left the final choice as to whether the results were acceptable or not to her.

The first image was close, but she wasn’t completely happy. So, she refined the prompt herself and this was the result. Perhaps you can guess who she might be… If you can, leave a comment and the first person to get it right will receive a prize when I return from my holiday. A small book from my destination, perhaps.

Another great advantage of this approach is that I’m able to ask my collaborator why we might be running into problems with the image generation. Technically, it may be like asking a doctor to diagnose himself, but it feels like working with someone who understands the machine far better than I do and is able to communicate to me in a way I can understand. Also, when she identifies a potential issue, she’s able to ask me if I’d like to try again with her suggestions. Then, I need only say ‘Yes, please’ and she’ll proceed by herself with no need for me to write any further prompts. Much of the process above was my collaborator helping me understand what’s happening and working with me to iterate to try to find a solution.

And this is how we finally cracked it.

Attempt the fourth

As any creative person knows, creativity becomes a self-fulfilling process. As we get into the flow, we’re able to create more easily. I thought of a different way of approaching the image of a 19th century office clerk in a 21st century office. What if the clerk was instead a time traveller who’d travelled from the Victorian era to the present. My collaborator thought that was an excellent idea because by creating a narrative it would encourage the machine to embrace the anachronism rather than try to resolve it.

My collaborator offered to give it a try, I said yes and it worked first time. By introducing the concept of a time traveller, we gave the AI a narrative structure that allowed it to accept the anachronism as part of the story rather than something to be corrected. This shifted its perspective, encouraging it to integrate the Victorian and modern elements in harmony, rather than trying to make them conform to a single era. This narrative also enables us to view the office clerk having the fountain pen at a laptop as an essential part of the creative anachronism of the image rather than as a rendering error that needs to be resolved. I didn’t even have to enter a prompt.

And even the title of this post, mise en abyme, came from my collaborator after I outlined the irony of encountering in creating this image the very thing that caused us to begin creating this image.

A 19th-century clerk in Victorian attire writes with a fountain pen at a desk in a modern office, surrounded by workers using laptops. — A 19th century clerk in a 21st century office

Conclusions

So, what did I learn through creating this image?

When seeking to generate images, be prepared to keep iterating, repeatedly, but…
Learn when to recognise when you’re hitting some kind of ‘barrier’ that the machine can’t resolve
You may recognise such a moment when no matter how specific you are it doesn’t materially affect the outcome
Specificity however can work wonders in terms of framing style, clothes, light… the whole mise-en-scene
Having a collaborator can massively reduce the sense of friction when you encounter problems

And I’m sure that there are many more lessons to be learned for this is just the beginning of my exploration. We can also expect ChatGPT and Dall-E to evolve further.

I can’t wait to see the collaboration this will enable! And I’d love to hear how your own collaborations with AI unfold. What new worlds might you explore?