Schedule

08:00 – 08:05

Introduction

Prof. Andrew Davison

Imperial College London

08:05 – 08:30

On-Sensor Computer Vision with Pixel Processor Arrays

Prof. Piotr Dudek

University of Manchester

Abstract

Vision computing in challenging edge application scenarios such as control of agile autonomous robots, or eye-tracking for smart glasses, requires low-latency computation, within a stringent power and space budgets. This is challenging for conventional hardware pipelines, where performance and energy efficiency is limited by large amount of data movements between system components (sensors, processors, accelerators, memory). Pixel Processor Arrays are a new class of vision sensor devices that exploit recent advances in semiconductor technology, embedding processors within the pixels of the image sensor array. These ‘vision chips’ perform preliminary image processing directly on the focal plane, with only a small amount of relevant information transmitted out of the vision sensor device. The tight integration of sensing, processing, and memory within a massively parallel computing architecture results in high performance coupled and energy efficiency. This talk will overview the field of image sensing and processing hardware from the perspective of in-pixel computing. The key technologies and concepts behind vision sensor devices will be introduced. The lecture will be illustrated with case studies of CMOS vision chips developed at the University of Manchester, with application demonstrations in control of agile ground and aerial vehicles, keypoint extraction and tracking, high-speed visual odometry, computational photography, eye tracking, ultra-low-power video analytics, high-speed machine vision, etc. Insights and perspective on the future development of integrated sensor-processor vision systems will be provided.

08:30 – 09:00

Retinomorphic Vision Sensors

Prof. Kwabena Boahen

Stanford University

Abstract

Event cameras achieve higher (effective) sampling rate and shorter latency than frame cameras. Events from a Dynamic Vision Sensor (DVS) report temporal contrast (changes in log-luminance), but coherent optical flow produces incoherent events. For instance, when a DVS camera views a cluttered scene from a moving platform (e.g., a drone), an abrupt edge triggers events with much shorter latency than a smooth edge, and the contrast required to trigger an event is much higher for the latter. Unlike DVS, a Retinomorphic Vision Sensor (RVS) takes the ratio between local and surrounding luminance to report spatial contrast, adapts locally to optical flow (speed) to preserve temporal coherence across space, and detects globally coherent activity to suppress background motion. In the past, incorporating this processing on the sensor (~40 transistors/pixel) sacrificed fill-factor and pixel count. Only recently did it become possible to combine efficient photo-transduction with dense mixed-signal processing by hybrid-bonding a Back-side Illuminated (BI) CMOS Image Sensor (CIS) wafer with a Mixed-Signal (MS) CMOS wafer fabricated in a finer process.

09:00 – 09:30

Beyond Pixels: Co-Designing What On-Sensor Vision Emits, Hides, and Trades Off

Prof. Xuan 'Silvia' Zhang

Northeastern University

Abstract

Modern edge vision systems are bottlenecked not by computation, but by the data the sensor must read out and transmit. Raw pixels are simultaneously an energy bottleneck dominated by ADC and off-chip communication and an information bottleneck for downstream tasks. The right sensor output is rarely an image; deciding what it should be is an algorithm–hardware co-design problem spanning optics, pixel circuits, and downstream models. This talk organizes our recent work into three threads. LeCA, BlissCam, and SnapPix push compression into the pixel array, turning video into information-rich coded outputs and cutting edge energy by up to an order of magnitude. PrivateEye and HoloCode use diffractive optics and metasurfaces to separate task-relevant from sensitive features before the photodiode, enabling near-zero-energy privacy preservation. CamJ and our quantitative modeling framework let us reason about energy, autonomy, and task accuracy as a single co-design problem.

09:30 – 10:00

On-Sensor Computer Vision: Heterogeneous Architectures for Low-Latency Perception

Dr. Mika Laiho

Kovilta

Abstract

On-sensor computing is emerging as a powerful approach to reducing latency, energy consumption, and off-chip data transfers in modern vision systems. By integrating sensing and computation on the same chip, it enables real-time perception directly at the source of data. However, this integration also introduces new design constraints, including limited flexibility, shared manufacturing technology, and tight area budgets—requiring a fundamental rethinking of both hardware architectures and their coupling to vision algorithms. In this talk, I will explore heterogeneous on-sensor computer vision architectures, focusing on latency-critical applications such as collision avoidance in autonomous drones, robots, and vehicles. I will discuss how efficient hardware–algorithm co-design and careful dataflow planning can eliminate bottlenecks and reduce intermediate storage. As a case study, I will present a prototype heterogeneous on-sensor vision chip (RECER S1), which integrates a 640×480 pixel array with multiple specialized computing cores, including pitch-matched column-parallel processing units, a 2D cellular neural network, associative memory, and an embedded RISC-V processor. I will explain how its key architectural choices and dataflow strategy enable low-latency processing directly on the sensor.

10:00 – 11:00

Posters / Demos with coffee

Posters

Demos

11:00 – 11:30

Towards Agentic Computational Photography

Prof. Gordon Wetzstein

Stanford University

Abstract

Neural networks and advanced image processing algorithms excel in a wide variety of computer vision applications, but their high performance often comes at a steep computational and bandwidth cost. In this talk, we explore a shift from passive capture to agentic computational photography—a paradigm where imaging systems dynamically adapt their acquisition strategy to the task at hand. We first discuss hybrid optical-digital co-design strategies that outsource intensive computations into the optical domain, enabling processing at the speed of light with minimal power. Building on this foundation, we introduce task-aware foveated imaging systems that treat sensor acquisition as a learned attention policy. By leveraging dual-stream architectures and closing the perception-acquisition loop, these systems intelligently allocate bandwidth to critical regions of interest in real-time. This convergence of optical computing and adaptive acquisition opens new frontiers for intelligent imaging systems capable of high-performance perception under strict power and latency constraints.

11:30 – 12:00

Speck: Where Sparse Sensing Meets Sparse Neural Processing

Dr. Mina Khoei

Director of AI applications at SynSense

Abstract

This talk provides an overview of Speck, a smart vision sensor that integrates a Dynamic Vision Sensor (DVS) with a Spiking Neural Network (SNN) processor. Speck processes data in a fully asynchronous fashion, preserving the sparsity of recorded light changes through spike-based processing. As a result, it offers a low-power, low-latency solution built on spiking CNNs (sCNNs). The talk will cover the sensor’s architecture and highlight some implemented applications.

12:00 – 12:30

AI sensors for efficient, personalized Contextual AI

Dr. Barbara De Salvo (replacing Dr. Richard Newcombe)

Introduction

On-Sensor Computer Vision with Pixel Processor Arrays

Retinomorphic Vision Sensors

Beyond Pixels: Co-Designing What On-Sensor Vision Emits, Hides, and Trades Off

On-Sensor Computer Vision: Heterogeneous Architectures for Low-Latency Perception

Posters / Demos with coffee

Towards Agentic Computational Photography

Speck: Where Sparse Sensing Meets Sparse Neural Processing

AI sensors for efficient, personalized Contextual AI

Lunch