Day 8 - Computing Matter (Tobi Delbruck, Melika Payvand, Walter Senn)

 


Today felt like a shift in perspective. The discussions kept circling a deeper question:


What is the (biological and artificial) substrate of computation and how much does it matter?


Three speakers approached this from different angles:

  • Tobi Delbruck → sparsity and physical limits
  • Melika Payvand → richer neuron models and input-dependent dynamics
  • Walter Senn → dendrites, gain modulation, and links to attention/transformers

The morning discussions moved across levels:

from hardware constraints (sparsity), to neural (gated) dynamics, to cognitive function (attention in dendrites), and back again to...how do we design intelligent systems?


Sparsity dominates everything (?)


Tobi started from a deceptively simple question:

What is the actual operating regime of the brain?


He sparked discussion on brain insights and shared numbers/observations to understand efficiency. 

  • ~10¹⁵ synapses
  • Avg ~1–10 Hz firing rates (very low)
  • ~10⁻¹⁴ J per synaptic event (energy per synaptic operation)
  • ~100 TOPS/W equivalent efficiency
  • 99.9% sparse in time (0.1Hz firing rate, 10ms time scale within neurons)

We commonly think that the brain is efficient because it uses spikes, but Tobi’s argument was different:

The brain is not efficient because it spikes.

It is efficient because it is inactive most of the time and communicates very rarely.


We get 2 types of sparsity:

  • IN TIME - Temporal sparsity → neurons are silent/inactive 99.9% of the time 
  • IN SPACE - Spatial sparsity → connections dense locally, but globally >99% sparse
  • Overall, brains are 4-6 "9's" sparse

Critical provocations:

  • Is temporal sparsity the objective, or just a byproduct?
  • What are we optimizing? latency? total operations? average power?
  • Should we focus on where and when computation happens and not just how it is represented?

He also pushed back on bio-mimicry hype:

  • Dendrites in silicon are not practical
  • Electrons vs ions → 10^11 faster to respond to electric field
  • Chip area = money :( → biology ≠ good engineering blueprint?

Melika’s talk was a direct challenge to a core assumption in neuromorphic engineering. 

Maybe we picked the “wrong” computational neuron model/primitive as a field. 

She argues that we have been using Leaky-Integrate-and-Fire (LIF) neurons, which are good as a communication interface, but they are very poor and simplistic computational units.


What is the problem with single LIF neurons? 

We get fixed time constants (τ), a fixed temporal window/memory, and it resets after a spike (loses context). 

That translates to a very limited temporal context and cannot handle multi-scale temporal structure. We can compensate by stacking neurons, using RNNs and SNNs with recurrence and distributing timescales across neurons. But learning weights and time constants together is relatively hard (although possible) with vanishing gradients and unstable optimization.



Her key idea is to make time constants input-dependent. 

Then in the case of an input with multi-scale dynamics, this would lead to a dynamical system where Z would act as a gate (that depends on the input), and h is the input projection. This would work essentially like a continuous-time GRU (gated-recurrent unit). We could get a neuron that decides what to remember AND what to ignore. That goes beyond passively integrating (standard simple LIF) to selecting information actively over time. 


From a hardware angle, as you can see from the attached photos with the crossbar arrays, she connects this model to in-memory computing (+physics does MAC/Multiply and Accumulate “for free”). We can use translinear circuits for multiplication and gain modulation to implement the gating/decision mechanism Z (on whether to integrate or ignore).

In the end, computation isn’t just about adding weighted inputs. It’s about memory, gating, and dynamics working together. And this ends up being useful in both continuous and clocked time.


Connecting back to the start of the morning session, Melika’s point is that the limitation is NOT the spikes.

Maybe our neurons are just too simple to “use” time properly...



From soma to dendrites to attention 


Walter’s talk was the hardest to parse, but maybe the most conceptually rich. 

If Melika’s question was “how should a neuron compute?”, Walter’s was one level higher by trying to bridge biophysics, cognition, and modern AI.

How does computation become selection, memory, and ultimately cognition?






He started by explaining that in biology, neurons are not simple integrators and dendrites are not passive, but they have proximal and distal inputs, they are back-propagating action potentials, and there are calcium spikes that amplify signals. In that sense, a single neuron is a multi-stage dynamical system.


Gain modulation then acts as a selective switch or attention gate (multiplicative interaction). He builds the following analogy for mapping to attention (transformers). The cortex selects relevant inputs via gain modulation (a topic he's been thinking about for a long time), and this acts similarly to attention weighting (which inputs matter and which should be suppressed). The query is a top-down signal, the key (k) is a sensory representation, the value (v) is stored information, and attention is the gain modulation. 

Then two competing views emerged in the audience: the dynamic time constants and the idea that memory comes from recurrence, dendritic nonlinearities, and plasticity.  

He argues that learning weights is much easier (learning what to remember) than learning time constants (learning how long to remember). 

In his view, the brain is not just storing the past, it is constantly deciding which parts of it still matter (a property called selectivity that ends up being very important in modern machine learning models), through mechanisms that give rise to computation similar to what we see in Transformers.





Matthew walking us through an event-based implementation of the Hough transform for real-time inverted pendulum tracking. Equations on a flipchart, sea in the background, and the workgroup spot conveniently located next to the ping-pong table by the beach. Peak CapoCaccia energy 🤖 📈


After the morning session, there was an interesting late afternoon discussion with Dan Goodman and others titled “Beyond LIF”, exploring how much neuron complexity we actually need.

He started by framing neurons more generally as dynamical systems with internal state variables, where the threshold operation becomes a key bottleneck in computation. More complex models, such as AdEx, resonate-and-fire, or dendritic compartment models, introduce richer temporal dynamics, potentially reducing the need for large numbers of parameters. A central theme was the trade-off between neuron complexity and network size, and how temporal processing (via time constants, delays, or filtering) can be implemented in multiple equivalent ways. Importantly, they highlighted that making time constants adaptive or state-dependent, rather than fixed, aligns closely with ideas from neuromodulation and gated dynamics. 

Overall, the discussion emphasized that achieving multi-timescale computation likely requires combining richer neuron dynamics with structured temporal and spatial representations.










Comments

  1. LIF neurons are over-simplistic models - they treat all inputs the same way. Real neurons are much more complex - but what is it that matters? 70% (approx) of neocortical neurons are pyramidal neurons, which appear to have two sites of integration, and a non-linear (and complicated(!) and modulatable) interaction. Suggestion has been made that the basal input area receives (primarily) external input (i.e. processed external input) and the apical input receives contextual inout (from other inputs and from internal areas). See W. A. Phillips book "There co-operative neuron").

    Ongoing work at Stirling (and elsewhere) by Adeel Ahsan has explored this area suggestion (i) more efficient transformer-like systems and (ii) systems that are harder to mislead, because they take specific account of context.

    This fits quite well with the "gain modulation" material above; but it's also supported by a lot of work in neuroscience (Matthew Larkin's group in particular)

    All the best to all there - maybe one day!

    ReplyDelete

Post a Comment

Popular posts from this blog

Day 1 - Welcome to the workshop (Giacomo, Saray, Stan, Rodney, Andre, Tobi)

Day 2 - How parts make a whole (Dan Goodman, Bassem Hassan, Matthew Cook, Mariela Pekkova)

Day 3 - How to build a Brain (Rui Graça, Florian Engert, Robin Hiesinger, Bassem Hassan, and Stan)