June 9, 2025

Reprogramming Sensory Habits

About the Author

Luba Elliott

Luba Elliott is a curator and researcher specializing in creative AI.

Karyn Nakamura, *Surface Tension* (installation view), 2025. Courtesy of the artist and SHOWstudio

‍

Reprogramming Sensory Habits

Luba Elliott explores what the latest research into AI and creativity reveals about the changing nature of how we see and its impacts on contemporary image making. The curator of the AI Art Gallery at the Computer Vision and Pattern Recognition (CVPR) conference speaks to three of the shortlisted artists: Tom White, Justin Urbach and Karyn Nakamura.

‍Elliott’s gallery is exhibiting 70 works in person at Nashville’s Music City Center (June 13–15, 2025) and a total of 100+ online at thecvf-art.com.

‍

Tom White

‍Luba Elliott: Tom, we are delighted to premiere your Atlas of Perception at CVPR. What inspired the project? How did you select—and name—its many visual elements?‍

‍Tom White: I've always been fascinated with the inner world of AI systems, specifically how machines see and understand the world. With the latest advances in model interpretability, we can decompose how these systems work into visual concepts. These concepts are the building blocks visually for how the model works. It's tremendous fun exploring this subfeature space—often strange and at first nonsensical.

In this project I present roughly fifty of these concepts from Google's SigLIP vision model—a multimodal image-text model from Google DeepMind similar to OpenAI’s CLIP but with improved training techniques that enable efficient scaling and improved performance. I then show detailed animated visualizations of triggering imagery.

The main criterion for including the visible elements was that the visualization had to be "self-activating." This is a property I came up with that I believe is critical for a faithful presentation: all images in the artwork itself have been verified to actually activate the SigLIP model concepts they represent.

I was actually hesitant to give the selected elements names at all but felt it was necessary so that they would not be known simply as Concept 1182. These labels are really just nicknames that I whimsically choose after the fact so that I have a way to refer to the concept.

‍Luba Elliott: The idea of understanding the world based only on archetypal visual elements reminds me of humanity’s various pictorial communication systems: hieroglyphics, pictograms and now emoji. What parallels do you see between these universal visual communication systems and our general visual perception?‍

‍Tom White: That’s an interesting metaphor but the primitives I'm exploring aren't currently used in a context of communication. They are more like elements deep within the hidden "grammar" of computer vision. A useful metaphor for this is color theory. It's fascinating and non-intuitive that everything we see digitally can be broken down into discrete elements and then represented using only the primary colors: red, green and blue (RGB).
‍

In the same way, at a higher level in computer vision, it appears all of the "chunks" within the visual field can be understood with an analogous fixed set of "primary visual concepts" that we are beginning to understand.

There is certainly a parallel across these communication systems and exploring these primitives of perception. As we begin to understand these primary visual concepts and the ways they can be combined, it should be possible to use this knowledge to develop new types of visual communication systems.

‍Luba Elliott: The flower sculpture of Atlas of Perception is a spectacle. What is the logic behind using that as a form? To me it recalls both Edward Ihnatowicz’s SAM in Cybernetic Serendipity and the contemporary pop art cartoons of Takashi Murakami. You also called it a windmill, which suggests Cervantes’s Don Quixote—and misperception.
‍

Tom White, *Atlas of Perception*, 2025. Courtesy of the artist

‍

‍Tom White: The feature space within a visual model is abstract compared to that of a physical drawing. My intent was to capture this by creating a presentation that felt equally amorphous—without a logical beginning or end. The sculpture's form is driven by this radial arrangement of screens which presents an animated, spinning, interlocking image.

Another factor that influenced the scale was the desire to show each of these individual concepts as an immersive super stimulus. Often "feature visualization" works are presented as modest sketches but I've found they can be incredibly detailed and nuanced, definitely worthy of a grander presentation.

Your references were not front of mind when making this but are certainly relevant in retrospect, as they are all instances of forms that seem to jump off the page and not be confined to a screen.

One of the amazing things about the new generation of synthetic imagery tools is that they have this ability to power new types of viewing experiences that are not confined to small rectangular surfaces.

‍
‍Luba Elliott: Your work with these synthetic imagery tools frequently probes the differences between machine and human vision. How do you feel machine vision has evolved over the years in comparison to how we see the world?‍

‍Tom White: My own work over the past eight years has gone through three main phases, which track how machine vision has also progressed over the past decade. My earliest work used inference-time compute with vision classifiers to search for shapes and forms that aligned with trained categories. This stage captures the practices at the time, which were to begin with a human-crafted ontology and training set, using this to automate decision-making.

The second phase began with the introduction of language modeling into computer vision. The pre-baked categories were no longer needed as models could be trained all at once on internet-scale datasets of images and captions. My work across this stage was creating art from early text-to-image systems, which elicited more ephemeral visual categories.

The third phase is a continuation of this arc. With the latest techniques in reinforcement learning, the internet itself potentially becomes a less important "bootstrapping step" to the models being able to discover new types of knowledge or patterns on their own.
‍

The interpretability research has the right grounding for this era in that this field is asking, "What is it the system has to tell us?"

This work, Atlas of Perception, is an example of attempting to expose the hard-earned knowledge within these systems visually and see what the AI can tell us about our visual world.

‍

Justin Urbach

Artist Justin Urbach is joined by AI researchers Alexander Koenig and William East to discuss their project BLINDHAED.
‍

‍

‍Luba Elliott: For this project, BLINDHAED, which came first, the technology or the idea for its artistic exploitation?‍

‍Justin Urbach: The idea came first. The question of how we see the world—how the human eye might be extended or transformed through technology—has been central to my artistic practice for a while. In earlier exhibitions, I was already working with the visual language of technological vision—laser-engraving camera calibration patterns onto monitors, for instance—to explore how machines perceive. I was interested in what it means to see in different layers or frequencies and how that changes our understanding of the world.

From that foundation, I started speaking with Alex [Koenig], who works with robotic systems, and shared my ideas around vision and perception. He told me about new cameras they were using in the lab for robotic perception that operate differently from traditional ones.
‍

It felt like a natural progression to move from a conceptual investigation of vision to actually engaging with machine vision tools—not just using them but critically and creatively exploring them as a way to reflect on human perception itself.

‍
‍Luba Elliott: Tell us more about these new cameras. What are the unique affordances made possible by event-based vision?‍

‍Alex Koenig: A regular camera forms an image by recording the brightness at every location in the image. Event cameras only respond to local changes in brightness. Therefore, an event-based camera perceives motion because motion changes the local brightness. These cameras are still in their infancy and are mostly a research tool.

However, they offer unique affordances for engineering applications: they are efficient because only the changes in a scene produce data; they are incredibly high speed because each pixel asynchronously speaks to the computer; and they have a high dynamic range, meaning they can perceive bright and dim areas in the scene at the same time.
‍

We wanted to bring event cameras into the arts because they produce radically different, otherworldly imagery that offers a novel artistic interpretation of vision itself.

‍
‍William East: One other unique facet of event-based vision is the ability to manipulate the data stream to show a trace of a movement across time from different angles. By reorganizing the event data, it is possible to transcend the traditional linear time progression of movement, allowing us to reevaluate how we perceive motion.

For example, you can expand a brief moment into a long sequence, revealing details of rapid movements that would be imperceptible to traditional cameras. You also have the possibility of seeing an event from multiple temporal vantage points simultaneously, like an eye, free from time. I think this is where their otherworldliness comes from.

‍Luba Elliott: Can you talk more about how event-based vision enhances human and machine vision?‍

‍Alex Koenig: Event-based cameras are bio-inspired and are sometimes even called “silicon retinas.” They work similarly to the human retina. Our eye doesn’t have a shutter and takes sixty frames per second, like a video camera, right? No, our retina has many individual sensory cells that asynchronously talk to our brain as stimulation through light occurs—like an event camera.
‍

Event cameras live in the digital age. They chop our continuous world into discrete bits, offering a new perceptual reality.

The intricate connection between event cameras and human eyes opened up a substantial and direct thematic link between high-fidelity technology and our fragile human body, which is constantly transformed by high-tech.

‍William East: Adding on to that, the frontier of possibility that event-based vision affords us also reveals a new precipice of limitation. For example, with event-based vision, color plays an entirely different, codified role in the distinction of movement. It becomes subsumed to the functional purpose of differentiation.
‍

In traditional imaging, color plays this rich and multifaceted aspect of conveying mood, depth and realism; whereas, in event-based vision, color is used solely to delineate change from stasis.

This limitation in color representation pushes us to reconsider how we interpret visual information. Without the crutch of realistic color reproduction, we're forced to focus on patterns, textures and the flow of movement. But it also raises the question, what do mood, depth and realism mean in a world viewed through an event-based lens? This juxtaposition of event-based vision with the cues established in the times of the regular camera acts as a reminder that technological change always drags with it the ghosts of its precursors.

‍Luba Elliott: For your recent exhibition in Germany, you laser-etched display screens, a procedure that draws parallels to the laser eye surgery operation. However, the bodies in question are quite different—one is sensitive flesh and the other a cold physical screen. What is the significance and effect of laser etching on such different materials? How does laser etching impact the display of the content on screen?‍

‍Justin Urbach: The act of laser-etching the screens was both a symbolic and physical intervention. Conceptually, it draws a parallel between human and machine vision—between the surgical precision used to correct human sight and the technological calibration of visual devices. One operates on living tissue, the other on synthetic surfaces, yet both alter perception.

By etching into the display itself, I’m not just showing an image on the screen but physically transforming the screen as a visual interface. The laser creates scars—interruptions in the pixel grid—that permanently affect how the image is rendered. This interference forces the content to interact with its own surface, making the act of seeing no longer transparent. You become aware of the screen as an object, as a membrane, not just a window.

In a way, the screen becomes wounded—just like an eye undergoing surgery. But unlike corrective surgery, which aims for clarity, this procedure introduces distortion. It questions the desire for ever-sharper, ever-clearer vision. It’s a way to materialize. It makes the invisible structures and biases of machine vision physically present—etched into the device rather than hidden behind it.

‍Luba Elliott: BLINDHAED explores seeing in a world shaped by technology and body enhancement. In the future, how can humans incorporate event-based vision into their sensory repertoire? Would it be through operations or physical devices?‍

‍Justin Urbach: Rather than thinking in terms of direct incorporation, I see event-based vision as introducing a new perceptual model—one that humans might engage with through proximity and interaction rather than assimilation. External systems such as wearable sensors, spatial interfaces, or intelligent environments could act as sensory extensions, not to mimic the eye, but to challenge and expand how we attend to the world.

What interests me more than the technical integration is how these systems reshape our relationship to perception itself. Event-based vision doesn’t “see” in images but in shifts—motions, contrasts, events. It’s a logic of difference rather than absolute representation. If we begin to sense through this paradigm, it could alter how we experience time, presence and causality.
‍

In this way, even without physical fusion, these technologies could reprogram our sensory habits, shifting us from a static, image-based mode of seeing toward a more procedural, responsive and fluid engagement with reality.

‍
‍

Karyn Nakamura

‍

Luba Elliott: In your project, Surface Tension, you work with microscopic footage of animating physical neurons as a vehicle for exploring how visual media is shaping our realities. What were your inspirations?‍

‍Karyn Nakamura: The installation revolves around a particular kernel of truth: the raw footage from a bright-field microscope. The footage captures the process of animating individual neurons—the cellular matter in human brains—by physically moving them on a glass slide. Prior to this project, I worked in the field of visual forensics, evaluating how AI computer vision models interpret video evidence at a nonprofit and studying people’s heuristics for AI-generated images.

When all you have is a few seconds of video to understand what happened in an event, then your reality of the event depends entirely on how the image was constructed: details like where the camera is located, how the camera captures images, and the many layers of analysis that come afterwards. These are all parameters that can be pretty fuzzy.

There are many instances outside of forensics where images are our interface to information and knowledge. Images can be manipulated to highlight certain aspects and fade other aspects.
‍

An image is just one surface of the reality that it tries to capture.

The microscope footage served as another example of a piece of media that gives us access to a reality we can’t see with our own eyes.

‍Luba Elliott: In Surface Tension, you use a range of computational processes, such as color inversions and diffusion models, to transform microscopic footage. What made you choose these particular computational processes and what meaning lies behind their aesthetics?‍

‍Karyn Nakamura: Each generated texture serves as another surface of the image. The process begins with the physical manipulation of neurons under a microscope, a direct interaction with biological reality. This physical process is then reimagined through distortion and diffusion, adding layers of body-like textures onto the image. The textures cycle through, each one concealing and revealing different facets of the original image, filtering and molding our perception of the neurons. Images at various stages in the technological reconstruction of the absurd reality morph in and out, slide over and under each other, wearing the traces of their own making like a skin.

The computational neural networks underlying diffusion are modeled after the very biological structures they are used here to embellish and reimagine. Each step—the physical manipulation, the computational model and the conceptual metaphor—informs the other by passing around pieces of media that travel across the lines of reality and simulation.

‍Luba Elliott: For the original display of Surface Tension at the SHOWStudio Gallery, you incorporated various technological elements, including contemporary recording and transmission devices such as security cameras, TVs and drones. Meanwhile, the project itself relies on established scientific tools like microscopes and tweezers. How do each of those technologies impact vision?‍

‍Karyn Nakamura: The installation at SHOWStudio unfolds across five screens, a wall of projection and a slide projector modified into a microscope that projects the actual specimens on the slides used under the optical tweezers, creating a meta-animation of its own process.

A choreographed drone weaves through the space, capturing a single, curated perspective of the work. This is broadcast on a TV in the front gallery, where a large mirror sculpture obscures the full view of the main space. Like the microscope that lets us peek into the microscopic world, the drone's eye becomes the final layer of the chain of technologies and the only way to view the installation, another mediated gaze, filtering reality through a particular perspective, shaping what can be seen and what remains hidden.

-----

‍Luba Elliott is a curator and researcher specializing in creative AI. She works to engage the broader public about the developments in creative AI through talks, events and exhibitions at venues across the art, business and technology spectrum, including The Serpentine Galleries, V&A Museum, Feral File, ZKM Karlsruhe, Impakt Festival, NeurIPS and CVPR. Elliott’s AI Art Gallery at CVPR is exhibiting 70 works in person at Nashville’s Music City Center (June 13–15, 2025) and a total of 100+ online at thecvf-art.com.

‍Tom White is an artist and researcher whose work explores the intersection of human and machine perception. With over 25 years of experience in artificial intelligence and design, White studied at MIT under John Maeda and worked alongside notable figures such as Casey Reas, Golan Levin and Ben Fry, contributing to the foundations of creative coding tools that evolved into Processing and openFrameworks. He currently teaches at Victoria University of Wellington School of Design. White continues to blend artistic expression with the latest AI research.

‍Justin Urbach is a media artist and filmmaker whose work investigates the evolving relationship between humans, technology and ecological systems. Combining experimental filmmaking, immersive media and speculative design, his projects create multi-layered narratives that engage with perception, environmental transformation and post-anthropocentric imaginaries.

‍Alexander Koenig is a researcher and artist based in Berlin. Alex is currently pursuing his PhD in embodied intelligence at TU Berlin, where he studies the control of robot hands.

William East is a multidisciplinary creator and artist working in Berlin. He is currently developing software for a new generation of smartphones designed to reshape our relationship with technology.

‍Karyn Nakamura is a Tokyo-born, New York-based artist and visual forensics researcher. Her work explores the interplay between media, technology and human agency as well as the social and technical infrastructures that shape communication.