Can AI Recognize the Saints? Evaluating Multimodal Models for Christian Iconography

Over the past few years, Generative Artificial Intelligence (e.g., ChatGPT) has been widely utilized for general-purpose tasks, including language translation, image generation from text prompts, and even code writing. However, can AI also be used for domain-specific tasks, such as analyzing historical artworks?

That was the question guiding my latest research, carried out with my PhD supervisors. We tested whether today's most advanced AI Multimodal models can recognize Christian saints in works of art, and therefore be used to help curators identify and catalog these figures more efficiently.

Why Saints?

Christian iconography, which is the visual language of saints, biblical figures, and their symbols, has been studied for centuries. Scholars often rely on small details, such as a sword for Saint Paul or arrows for Saint Sebastian. These visual clues are rich in meaning and easily understood by scholars, but tricky for computers to grasp.

If AI can identify these figures, it could be a powerful assistant for museums, libraries, and archives that manage vast digital collections of artworks. Imagine a curator quickly finding all depictions of Saint Francis across thousands of paintings, without manually checking each image.

Saint Francis

Founder of the Order of Friars Minor (Franciscans), Francis(cus) of Assisi; possible attributes: book, crucifix, lily, skull, stigmata

Saint Paul

the apostle Paul of Tarsus; possible attributes: book, scroll, sword

Saint Peter

the apostle Peter, first bishop of Rome; possible attributes: book, cock, (upturned) cross, (triple) crozier, fish, key, scroll, ship, tiara

Saint Sebastian

the martyr Sebastian; possible attributes: arrow(s), bow, tree-trunk

Saint Catherine

the virgin martyr Catherine of Alexandria; possible attributes: book, crown, emperor Maxentius, palm-branch, ring, sword, wheel

Mary Magdalene

the penitent harlot Mary Magdalene; possible attributes: book (or scroll), crown, crown of thorns, crucifix, jar of ointment, mirror, musical instrument, palm-branch, rosary, scourge

What We Did

We compared two families of AI:

Vision Language Models: (CLIP and SigLIP) that connect images to words.
Multimodal Large Language Models (GPT-4o and Gemini 2.5) that can process text and images together.

We tested them on three collections of religious art and running three experiments. For each image we asked the AI to identify which saint was depicted using:

Just the names – asking the AI to match an image to a saint's name (E.g., Saint Sebastian).
Descriptions included – providing a short description of each saint's attributes (E.g. the martyr Sebastian; possible attributes: arrow(s), bow, tree-trunk).
A few examples – letting the AI learn from five examples before making its guesses.

What We Found

The results were interesting:

Modern AI models outperformed other approaches. Gemini 2.5 and GPT-4o, trained on general-purpose data, were more accurate than traditional machine learning systems trained specifically for this task.
Context matters. When we gave the models descriptions of the saints (like "Mary Magdalene, often shown with a book or crown"), they performed better.
Learning from a few examples wasn't always helpful. In some cases, adding examples confused the models instead of improving them.

So what?

This research demonstrates that AI can already assist us in navigating the complex world of art history. A crucial point is in the quality and curation of training data, which has a substantial impact on performance. This highlights the continued importance of expert knowledge in digital humanities applications.

The adoption of AI won't replace expert eyes; saints are still best recognized by trained scholars, especially in real-world scenarios where multiple saints are depicted in the same artwork. However, these models can still accelerate research and help identify patterns across vast collections.