Santiago de Compostela
Sabine Süsstrunk
École Polytechnique Fédérale de Lausanne
On generating image and video hallucinations
“Hallucination” is a term used in the AI community to describe the plausible falsehoods produced by deep generative neural networks. It is often considered a negative, especially in relation with large language models or medical image reconstruction. Yet, in many computational photography applications, we rely on such hallucinations to create pleasing images. It often does not matter if all (or any) information was present in the real world if the produced falsehoods are visually plausible.
Starting from that premise, I will present our recent work on hallucinations in image reconstruction, image style creation, and texture synthesis, using different generative models such as diffusion networks, neural radiance fields, and neural cellular automata. With a nod to the dangers some of these hallucinations might pose, I will also briefly discuss our work on deep fake detection.
Mahdi Soltanolkotabi
University of Southern California
Theoretical Foundations of Feature Learning
One of the major transformations in modern learning is that contemporary models trained through gradient descent have the ability to learn versatile representations that can then be applied effectively across a broad range of down-stream tasks. Existing theory however suggests that neural networks, when trained via gradient descent, behave similar to kernel methods that fail to learn representations that can be transferred.
In the first part of this talk I will try to bridge this discrepancy by showing that gradient descent on neural networks can indeed learn a broad spectrum of functions that kernel methods struggle with, by acquiring task-relevant representations.
In the second part of the talk I will focus on feature learning in prompt-tuning which is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. We demystify how prompt-tuning enables the model to focus attention to context-relevant information/features
Fauzia Ahmad
Temple University
Near-field Radar Imaging – From Physics-Based to Learning-Based Frameworks
Many modern radar applications do not conform to the far-field propagation model, with a primary departure being spherical instead of planar wavefronts. The wavefront curvature has a major impact on system performance and must be considered in the scene reconstruction procedure for effective and reliable imaging. Near-field imaging techniques are, therefore, essential for short-range applications, such as ground-penetrating radar, biomedical radar, and automotive radar. Popular frameworks for solving near-field radar imaging problems include high-resolution subspace-based, sparse reconstruction, and more recently learning-based frameworks.
This talk aims to provide an overview of these nearfield image reconstruction frameworks and offers several illustrating examples in diverse radar applications.
Ricardo Carmona Galán
Instituto de Microelectrónica de Sevilla, (IMSE-CNM)
Visual information cues from a CMOS front-end sensor chip
CMOS image sensors are widely used in digital cameras and mobile phones due to their low power consumption, high speed and the capability of integrating multiple functionalities. The major drive for their development has been increasing spatial and temporal resolution. All the pixels need to be sampled and digitized before any visual processing can take place.
However, many application scenarios like autonomous driving, augmented and virtual reality, and the AIoT, would benefit from an earlier and more efficient processing of the visual information. And we know that object recognition and classification does not rely on prescribed features anymore. The deep learning approach has revolutionized recognition by outdoing human accuracy for some tests, but this has been achieved at the expense of a considerable power and a large amount of computing and memory resources.
In this talk we will exploit compressive sampling to extract the relevant content of the visual stimulus right at the sensor chip, thus allowing a lightweight and power-aware implementation of high-level inference.