DR

Research

My interests generally lie in statistical/machine learning theory for overparameterized models, particularly questions relating to generalization, implicit bias, and learning dynamics, often using tools from optimization and statistical physics.

My current work is in deep learning theory at BAIR in the DeWeese lab, neutron production in (\(\alpha\),n) reactions at LLNL, and information geometry (generously supported by a grant from VESSL AI).

Google Scholar: link. ORCID: 0009-0004-1252-1679.

Papers

A Theory of Saddle Escape in Deep Nonlinear Networks
Divit Rawal, Michael R. DeWeese
Preprint, 2026.
In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universality classes. On the permutation-symmetric submanifold, the identity combines with an approximate balance law to reduce the full matrix flow to a scalar ODE, giving a critical-depth escape time law \(\tau_\star = \Theta(\varepsilon^{-(r-2)})\) governed by the number \(r\) of layers at the bottleneck scale rather than the total depth \(L\). We find that this same \(r-2\) exponent is recovered under He-normal initialization with \(r\) bottleneck layers rescaled by \(\varepsilon\), where the symmetry manifold is preserved by the flow but not attracting. We find close agreement between our theory and numerical simulations.
Rao-Blackwellized Score Matching on Manifolds
Divit Rawal
Preprint, 2026.
We study the tangent channel of denoising score matching (DSM) when the latent law is supported on a smooth embedded submanifold \(M \subset \mathbb{R}^D\). For ambient Gaussian corruption, this channel is the projected denoising residual \(T_\sigma \doteq P_{T_{\pi(X)},M}(Z - X)/\sigma^2\), whose conditional variance diverges at rate \(d/\sigma^2\) as \(\sigma \to 0^+\). Within the class of fiber-collapsing summaries \(S(X)\), we identify the nearest-point projection \(\pi(X)\) as the canonical finest such summary, so that \(r_\sigma(z) \doteq \mathbb{E}[T_\sigma \mid \pi(X) = z]\) is the unique \(L^2\)-optimal Rao–Blackwellized predictor of \(T_\sigma\) in this class; the singular \(d/\sigma^2\) term is an irreducible Bayes-risk floor for any coarser summary. Expanding this canonical target in \(\sigma\), we show \(r_\sigma(z) = \nabla_M \log q(z) + \sigma^2[b_q(z) + g_M^{\mathrm{ext}}(z)] + o(\sigma^2)\), where \(g_M^{\mathrm{ext}}(z) = \bigl(\tfrac{1}{2}W_{H(z)} - \mathrm{Ric}_z^\flat\bigr)\nabla_M \log q(z)\), uniformly on \(M\). Here \(b_q\) is the intrinsic Tweedie term and the extrinsic term combines the Weingarten operator in the mean-curvature direction with the Ricci endomorphism. This extrinsic correction is absent from intrinsic-noising analyses and is the \(\sigma^2\) obstruction to recovering the Riemannian score by ambient DSM. On \(S^d\) it collapses to the scalar \((1 - d/2)\,\mathrm{Id}\), which vanishes at \(d = 2\), a common test case for manifold DSM. These results separate the removable singular variance of the raw ambient target from the intrinsic and extrinsic second-order biases of the canonical target.
Minimax Rates for Hyperbolic Hierarchical Learning
Divit Rawal, Sriram Vishwanath
Preprint, 2026.
We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-\(R\) hierarchies with branching factor \(m\), we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as \(\exp(\Omega(R))\) to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit \(O(1)\)-Lipschitz realizability, enabling learning with \(n = O(mR \log m)\) samples. A matching \(\Omega(mR \log m)\) lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-\(k\) prediction space captures only \(O(k)\) canonical hierarchical contrasts.
ALPHANSO: Open-Source Modeling of (\(\alpha\),n) Neutron Source Terms
Under review, Nuclear Instruments and Methods in Physics Research Section A.
American Nuclear Society Student Conference, 2026 [poster]
└ 🏆 Best Paper: Mathematics, Computation, and AI Applications.
Institute of Nuclear Materials Management Annual Meeting, 2026.
Applications ranging from nuclear safeguards to dark matter detection require accurate predictions of neutron fields produced by (\(\alpha\),n) reactions. Legacy tools like SOURCES-4C remain widely used but suffer from significant limitations, including outdated nuclear data, missing target nuclides, and restricted accessibility. Here, we present ALPHANSO, an open-source Python package for calculating (\(\alpha\),n) neutron source terms. ALPHANSO incorporates modern nuclear data libraries and formats covering all naturally occurring target nuclides and provides a transparent, modular framework for updating or extending the data as new evaluations are released. Comparison with SOURCES-4C and experimental measurements across a range of elements and materials shows that ALPHANSO reproduces neutron yields and spectra that typically match or exceed the accuracy of existing codes. These results demonstrate that ALPHANSO is a reliable, accessible, modern replacement for legacy (\(\alpha\),n) source term codes. Its open-source design and modular data handling make it readily extensible to future evaluated nuclear data and low-background applications.

I have also contributed to: Foundation-Sec-8B-Instruct and Foundation-Sec-8B during an internship at Cisco Foundation AI, as well as off-shell Higgs production via neural SBI (ATLAS) and neural SBI for parameter estimation in ATLAS as a researcher in the Whiteson lab.