arxiv:2603.06922

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Published on Mar 6

· Submitted by

Nandan Kumar Jha on Mar 13

New York University

Upvote

Authors:

Nandan Kumar Jha ,

Abstract

NerVE provides a unified framework for analyzing feed-forward network dynamics in large language models through spectral analysis metrics that reveal information flow organization and optimization impacts across architectures.

AI-generated summary

We introduce NerVE, a unified eigenspectral framework for understanding how feed-forward networks (FFNs) in large language models (LLMs) organize and regulate information flow in high-dimensional latent space. Despite FFNs dominating the parameter budget, their high-dimensional dynamics remain poorly understood. NerVE addresses this gap through lightweight, memory-efficient tracking of eigenspectrum dynamics via four complementary metrics: Spectral Entropy (dispersion), Participation Ratio (effective dimensionality), Eigenvalue Early Enrichment (top-heaviness), and Jensen-Shannon divergence (distributional shifts). Our key insight is that FFN nonlinearities reinject variance across eigenmodes, fundamentally governing latent dimension utilization, and that optimizer geometry strongly modulates the extent of this variance reinjection. We validate NerVE across model scales, and diverse architectural and optimizer configurations, each uniquely shaping FFN dynamics: normalization schemes controlling variance flow; FFN weight geometries constraining latent space; positional encoding and activation functions regulating information flow; and optimizer choices redistributing effective capacity across depth. Across these settings, NerVE consistently recovers stable spectral signatures that correlate with model's generalization ability and respond predictably to design choices, generalizing beyond transformer to MLP-Mixer architectures, providing actionable insights for architectural and optimizer choices beyond trial-and-error.

View arXiv page View PDF Project page Add to collection

Community

nandan523

Paper author Paper submitter about 9 hours ago

•

edited about 9 hours ago

Hi everyone!

I am excited to share our ICLR 2026 paper, NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks. These are some interesting findings about the role of FFN (nonlinearity) in transformer architecture (we also verified them on non-transformer architecture: MLP-Mixer) :

FFN nonlinearities are secretly fighting a war inside your transformer. Self-attention collapses rank doubly exponentially with depth (Dong et al., ICML 2021). We find that, FFN nonlinearities fight back by reinjecting variance into under-utilized dimensions--a process we call nonlinearity-induced rank inflation, which makes the transformer network alive.
AdamW makes your nonlinearities work harder but achieve less, compared to Muon. Under AdamW, FFN nonlinearities spend their capacity repairing spectral damage (ill-conditioned pre-activation eigenspectrum ). Muon, on the other hand, preserves healthy spectra (well-conditioned per-activation eigenspectrum) , so nonlinearities only need to refine.
You can predict generalization with a single forward pass, no eval set needed. Our spectral metrics (spectral entropy and participation ratio) correlate with validation loss at |r| > 0.97 throughout training. Short runs can even rank architectural configurations before training to convergence.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.06922 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.06922 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.06922 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.