Day 7 - The Real World (Charlotte Frenkel, Jörg Conradt, Guido de Croon)
The Real World (Charlotte Frenkel, Jörg Conradt, Guido de Croon)
After a (somewhat) relaxing Sunday, the Capocaccia workshop gets back on track with an engaging discussion about applying our insights to the dreaded "real" world. From staring into the biological and theoretical abyss, we are now talking about pragmatic applications that work.
A Wishlist
Our first discussion leader, Charlotte Frenkel, kicked things off by highlighting that deep networks waste a lot of energy performing the exact same computations for every single image or token. The holy grail is data-dependent computation, in the words of Charlotte, letting "data shape your computation".
To get there, Charlotte presented her wishlist for an algorithm that would map well to real-world hardware:
Local in space and time: Relying on backpropagation through time is highly inefficient because it lacks locality; computations should rely strictly on information available *right now*.
Exploiting sparsity (event-based): The system should be lazy in the best way possible, computing *only* when new information arrives.
A good inductive bias: We want structures like Convolutional Neural Networks (CNNs), which assume neighboring data points are related, granting us translation invariance and saving parameters.
Robustness to temporal data: The real world is a "messy" place (like the brain or not like the brain depending on which biologist you ask). Our algorithms need to tolerate that noise.
Scalability: The golden rule of modern computing. If the network's footprint isn't scalable, it will never solve real-world problems.
So, what ticks all these boxes? Charlotte introduced the Event-Based Graph Neural Network.
Instead of processing static, empty background pixels, the graph is built dynamically as data arrives. Every new event is a node, and information only flows from past events to the current ones. To figure out who your "neighbors" are, you use a spatio-temporal metric, cutting off any events outside of some predefined threshold.
A particularly convenient feature of this algorithm, is that you don't have to recompute the whole graph for every new node. Instead, you can just fetch the cached state of older nodes from memory. It’s a classic compute-to-memory tradeoff, but it can be navigated in a very hardware friendly by using a cheap, local cache (like SRAM) to make fetching those features affordable.
What is this event-based graph neural network good for? Anything that requires low latency, provides sparse inputs, and contains mostly local relationships in time and space: for example, object detection in automotive vision. With this exciting preview of future applications of the algorithm and her hardware, Charlotte left us thinking about what else we could engineer with event-based graph neural networks.
A "Reality" Check
Following the coffee break, the gloves came off (somewhat appropriate for a session titled "The Real World"). Jörg Conradt, putting on his engineering hat (did he ever take it off?), dropped a provocative bomb: The neuromorphic community is tremendously underperforming.
While the community lauds itself for low-latency and low-power theoretical capabilities, Jörg pointed out that traditional engineers simply do not care if a line-following robot uses a low power budget if it doesn't actually zoom fast. They want hard numbers and demos that outperform current standards on tasks people actually care about (such as making line-following robots drive at Autobahn speeds).
The crux of the issue? Usability.
- Jörg called out that many brilliant, custom neuromorphic chips—and setups like Toby's incredibly functional sailing boat are painstakingly hard for non-experts to use.
- The turning point for event cameras 20 years ago was when Tobi and Patrick Lichtseiner made his elegant bio-inspired circuit into a USB camera and developed software for the computer vision and robotics community to use it.
- Today, we are badly missing that next layer of "plug-and-play" abstraction. We need the neuromorphic equivalent of typing `import torch` in Python (or asking Claude to `import torch` if we're being totally honest).
As to be expected in this passionate community, this sparked some debate about the tension between basic science and engineering. Is our goal to deeply understand physical systems, or to build a viable product?
While some argue we shouldn't force scientists to become product developers, others pointed out that a lack of standardization is actively hurting adoption and visibility (even for the scientists). Efforts like the Neuromorphic Intermediate Representation (NIR) exist to create a standard graph format for spiking neural computation, but adoption is painfully fragmented. Every major project (from IBM to Intel to Brainscales) rebuilds their APIs from scratch.Furthermore, as the discussion highlighted, we need to redefine "performance." If we play by standard machine learning's rules, performance just means accuracy. If we play by Tobi's rules, it means time-to-market. But for a robot in a closed-loop environment, 99.9% accuracy isn't as vital as low latency and energy efficiency (depending on the application of course). Quite possibly, this is where neuromorphic could shine (or blink brightly like the Lu.i neurons in the picture).
A "Reality" Gap
Next up was Guido de Croon, who brought the discussion back to real stuff we build, specifically flying robots. Using David Marr's levels of analysis as a framework (renaming the top level "strategic"), Guido illustrated how high-level navigational strategies (like whether a robot uses 3D maps like GPS or path integration like honeybees) dictate the algorithms and ultimately the underlying hardware required.
Guido expressed his version of the neuromorphic dream: everything processed via a single neural network on a single neuromorphic chip on a neuromorphic robot. However, this is incredibly difficult compared to the standard robotics approach of slapping a CPU and GPU onto a quadrotor.
Why bother with a neuromorphic approach, then? As one audience member asked, "Are you just making a rod for your own back?" (I'm actually not quite sure what this was supposed to mean but it seemed relevant enough to include in this report).
The answer lies in EXTREME robotics. If you want to build insect-sized, milligram-scale drones, every fraction of a watt and gram matters. Guido shared an example of researchers eliminating a less-than-one-gram gyroscope to save weight on a fly-sized drone, opting to rely entirely on vision for attitude control. Even on larger drones, ditching a heavy, power-hungry GPU can create a snowball effect, saving weight on the battery and the motors needed to lift it, which in turn increases flight time and speed.
Guido then tackled the infamous reality gap. This refers to the phenomenon where a robot trained perfectly in simulation immediately crashes into a wall in the real world. This happens because simulations are clean, while the real world is plagued by noise, discrete time issues, and the fact that every single drone has unique physical features related to its embodiment.
How do we cross this gap? The workshop brainstormed several solutions:
Domain Randomization: Varying parameters (mass, RPM, inertia) in the simulation during training to force the network to become robust to variations.
Transfer Learning: Aligning the latent space of the simulation with real-world sensory data.
Abstractions: Superimposing simplified problems on sensory input (e.g., using a network to extract optical flow rather than raw color pixels, which vary wildly in reality).
World Models: Instead of constantly reacting to raw sensory data, the drone builds a "world model" in its latent space. It predicts how its perception will change when it takes an action (e.g., "If I tilt right, the target in front of me moves left"). Using this approach, Guido’s team could even train a drone in its "dreams", predicting outcomes and rewards entirely within its world model with no simulation of the actual real world.
The final result of applying these solutions? A drone that flew faster than top human racers. The drawback, of course, is the "black box" engineering problem: if an end-to-end neural network is slightly slower than a competitor's, intervening to fix it in the moment requires complicated reward function shaping rather than tweaking a simple line of code. From this viewpoint, mechanistic interpretability in neural networks isn't just a concern for esoteric scientific understanding, it's a useful tool for practical engineering purposes.
The overall takeaway of the morning?
We have the models, we have the chips, we have the drones, and we have the energy efficiency. Now, we just need to pack it into a box (or drone) that the rest of the world can actually use.
Comments
Post a Comment