\(\mu\)P as Optimal Transport in a Vanilla MLP
Deriving \(\mu\)P as the unique scaling maximizing Wasserstein transport under stability constraints
A collection of thoughts, notes, and small projects, updated infrequently. This page is mostly kept updated for my own reference (mostly to keep track of my interests, which change pretty frequently), but hopefully some of the posts are interesting to others.
Deriving \(\mu\)P as the unique scaling maximizing Wasserstein transport under stability constraints
Designing an auction system for rooms in our house
A cool bit of math that falls apart empirically
Using symmetries to better understand how to design attention kernels
Some notes on Domingos' paper: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
A derivation of the heat equation from a stochastic model of energy exchange
Some notes on the Eigenlearning paper by Simon et al.
Building up RKHSs from the Riesz Representation Theorem
How I got ~$500 worth of cookies for free
A rough derivation of the neural tangent kernel
A quick implementation of a Hopfield network in Python