← TODOT
paper · june 2026

The Hidden Geometry of Transformer Weights:
A Journey Inside the Black Box

Gianluca Gernone
June 2026

Abstract

In this paper I report the exploration of a hidden geometric structure inside a Transformer language model, visible only after rotating the model into a coordinate system aligned with its own weight matrices. In these canonical bases of D=896* the hidden space reveals: a bipolar oscillator that rhythmically alternates between knowledge-receptive and filtering modes across layers — a phenomenon I have called CBLL** (Layer-Level Bipolar Control); a homeostatic mechanism that erases any intermediate perturbation within just two layers; a spectral collapse where the magnitudes of singular values are redundant; and a correlation-based knowledge representation where facts are encoded in patterns of co-activation rather than on individual axes. I describe what I observe, when and why it occurs, and the architectural constraints that produce it. The canonical-basis realignment is lossless: I verify identical outputs on both MMLU (47.50% → 47.50%) and MMLU Pro (15.50% → 15.50%). The realigned model, thanks to the canonical basis, is compressible without degradation up to 2× smaller through per-axis geometric quantization.***

* For the type of model in question. In the paper I will discuss the Qwen 2.5 0.5B, chosen because it is simpler to inspect and less demanding on hardware to advance the research. The method has passed tests on Transformer encoder, decoder, decoder-only, encoder-only, and ViT architectures.
** Levels that enable fine motor control, a respiration, a precise coordination of which axes to activate and when. Section 2.2.
*** Compressible to ~502 MB in geometric int8 (vs 988 MB fp32 original) with preserved quality. This without calibration data, deterministically. The canonical basis enables differentiated per-axis precision.

1. Introduction: Looking from the Right Angle

A Transformer language model is a function f : V^N → Δ(V) that maps a sequence of tokens to a probability distribution over the next token. Internally, it routes information through a sequence of layers, each applying self-attention followed by a feed-forward network, all mediated by residual connections and layer normalization.

The internal state is a vector in R^d — the hidden dimension or residual stream. Everything the model knows, every fact it retrieves, every inference it makes, passes through this 896-dimensional space (for a Qwen 2.5 0.5B). But the basis in which this space is represented is arbitrary — it is whatever the training process converged to. There is no reason to expect the standard basis vectors e_i to correspond to anything important.

I wondered: what if I rotated the entire model so that the coordinate axes align with the directions the model itself considers important?

This required solving two problems. The first: Transformers use RMSNorm with learned per-channel gains, which break rotational symmetry — you cannot simply rotate the model because the gains would no longer commute with the rotation. The second: you must identify which directions are important and which rotation maps them onto the coordinate axes.

The first problem has a known solution: absorb the gains into the adjacent weight matrices, making the normalizations uniform. This is a lossless algebraic transformation.

The second problem is empirical. I computed the Singular Value Decompositions of the feed-forward down-projection matrices across all layers and measured the pairwise alignment of their left singular vectors U. I found that U[:,0] from layer i aligns with U[:,0] from layer j with a mean cosine similarity of 0.944 across 528 layer pairs. The maximum is 0.991. The model possesses a shared set of preferred directions.

A rotation built from these directions, applied consistently to all matrices, produces the canonical basis. In this basis, axis k of the hidden state corresponds to a specific spectral direction. The transformation is functionally lossless — I verify this experimentally in Section 3.

Once inside, the structure becomes visible.

2. What You See

2.1 The Rich Club and the Bipolar Oscillator

In the canonical basis, each axis i of the 896-dimensional hidden space can be measured independently. I collected the activations from feed-forward outputs across thousands of tokens, computed the 896 × 896 Pearson correlation matrix C of per-axis activations, and analyzed its spectral structure.

The dominant eigenvector of C divides the axes into two poles:

PropertyPositive PoleNegative Pole
Axes309292
Top axes[9, 262, 62, 8, 629][2, 6, 570, 4, 207]
Mean energy0.2500.260

The two poles are near-perfect mirrors in energy. Of the 309 positive-pole axes, 83% possess a dedicated inhibitory partner in the negative pole — an axis with which they are strongly anti-correlated. The master pair, axis 62 ↔ axis 570, has ρ = −0.97: when one fires, the other is silenced.

The remaining 295 axes are neutral — they show weak correlation with both poles.

This is a rich club: high-energy axes correlate more strongly with other high-energy axes. The mechanism is self-reinforcing: correlation leads to co-activation, co-activation produces shared gradients during training, shared gradients increase energy, and higher energy increases correlation. The effective rank of C is 11 — only eleven independent correlation patterns exist in the entire model.

Why it happens. The correlation structure emerges from the training dynamics. The axes that the embedding layer activates most strongly receive more gradient signal, strengthen their weights, and activate even more intensely. The axes whose activations are anti-correlated develop a competitive relationship — they specialize for complementary functions. The axes that never enter this feedback loop remain outside the club — 646 of the 896 axes are effectively isolated.

2.2 CBLL — Layer-Level Bipolar Control

The bipolar oscillator is not static across the layers. It breathes.

I measured the ratio between positive-pole and negative-pole activation in each of the 24 layers, for prompts spanning factual questions, code generation, casual conversation, and random character sequences. The pattern is invariant to content:

PhaseLayersPOS/NEG RatioInterpretation
Encode0–41.36POS dominant — the model receives information
Process5–200.42–0.88NEG dominant — the model filters and selects
Decode211.28POS peak — the model prepares the response
Output22–230.54NEG dominant — filtered output emitted

The same five layers — [21, 3, 23, 2, 22] — are the top activators for every prompt tested. The respiration is not a response to the content. It is a structural signature of the model's operation, like a heartbeat.

When it happens. The encoding explosion at layers 2–3 coincides with the transition from the embedding to transformer processing. The decoding peak at layer 21 coincides with the final layers before the language model head. The dominance of the negative pole during processing (75% of the layers) suggests that most of the model's computation is spent filtering and selecting rather than retrieving.

Why does it happen? The bipolar oscillator is the dominant "eigenmode" of the correlation structure. It emerges because the residual stream must carry information across 24 layers without dissolving — the positive-pole axes provide signal propagation, while the negative-pole axes provide signal regulation. Without the bipolar structure, the residual stream would either amplify noise (all positive) or suppress all signal (all negative).

2.3 Homeostasis: Why You Cannot Push the Residual Stream

I attempted to amplify the activation of a specific canonical axis by applying a multiplicative gain at the output of a specific layer.

Gain 5× applied to axis a at layer 0:
  Layer 0:  5.0×  ← the amplification works locally
  Layer 1:  1.4×  ← already diluted
  Layer 2:  1.0×  ← completely erased
  Layers 3–23: 1.0×  ← no trace remains

The residual stream is homeostatic: any perturbation of the intermediate activations is annulled within two layers. Three architectural mechanisms enforce this:

1. RMSNorm divides the hidden state by its root-mean-square norm after every attention and feed-forward sublayer. If one axis is amplified, the normalization reduces the magnitude of all axes to compensate, preserving the overall variance at 1.0.

2. The attention projections are trained on the statistical distribution of the hidden states. An amplified axis produces query/key/value vectors outside the expected range; the attention softmax effectively ignores these outlier components.

3. The feed-forward networks operate on a specific activation range. The SiLU activation in the SwiGLU gate has an output close to zero for large negative inputs and a linear output for large positive inputs — both regimes lose information relative to the trained operating point.

This is not a bug. It is a stability property essential for training Deep Transformers. Without homeostasis, small perturbations in the early layers would amplify across 24 layers, producing chaotic outputs. The residual connection x_{l+1} = x_l + F(x_l) combined with RMSNorm normalization creates a dynamical system with a strong attractor at the trained operating point. The model is designed to resist perturbations.

The consequence is that you cannot steer the model by injecting signals into the intermediate layers. The model will erase the intervention. The only way to modify the behaviour is to modify the weights — the transformations themselves — rather than the signals that flow through them.

2.4 Spectral Collapse: S is Redundant

The feed-forward hidden states can be organized as a 3-tensor H ∈ R^{L × N × d} where L = 24 are the layers, N is the number of tokens, and d = 896 is the hidden dimension. A Tucker decomposition reveals the cross-layer structure:

Geometric rank (U): 22 of 24 layers. The directions of the singular vectors — where information is projected — vary independently across the layers. The geometry is not compressible.

Spectral rank (S): 1 of 24 layers. The magnitudes of the singular values — how much information is projected — are almost identical across all layers. A single vector S explains 90% of the cross-layer variance. The ratio between the first and second cross-layer singular value is 23.2×.

When I replace the singular values with randomly shuffled values and reconstruct, the model still produces coherent text. The model does not care about the exact magnitudes of the singular values. It operates on geometry, not on intensity.

The residual connections x_{l+1} = x_l + F(x_l) mean that each layer adds a correction to the hidden state. The magnitude of the correction matters less than its direction — the residual structure accumulates directional changes across the layers regardless of their individual magnitudes. The singular values reflect the scale of each layer's contribution, which is normalized by RMSNorm and is therefore redundant.

2.5 Knowledge is in Correlations, Not in Axes

I identified the axes with the highest per-axis entropy (computed from activation histograms over many tokens) as "knowledge carriers" — they show the most structured and least random activation patterns. I found 268 such axes.

Of these, only 27 are in the low-energy "empty" set. The other 241 overlap with already active axes. Knowledge is not stored in dedicated axes — it is distributed across axes that participate in many functions.

The effective rank of the correlation matrix is 11. This means that the model's entire knowledge structure — everything it knows about facts, reasoning, language and style — can be described by 11 independent correlation patterns across 896 axes.

Why it happens. During training, the model does not learn "axis 42 encodes Paris-is-the-capital-of-France". It learns that when certain axes activate for geographic queries, other axes co-activate for the European-cities context, while still others are suppressed to avoid conflicting knowledge. The pattern of co-activation is the knowledge. Individual axes are letters; the correlation structure is the language.

3. Experimental Verification

All experiments use the Qwen 2.5 0.5B Instruct, a 24-layer Transformer with hidden dimension 896, 14 attention heads, and SwiGLU feed-forward networks. The realignment is verified on additional models including Qwen 2.5 1.5B, Llama 3.2 3B, SmolLM2 1.7B, Phi3.5 mini, Granite 3.2 2B, Gemma 2 2B, Falcon3 3B, Qwen3.5 4b.

3.1 Losslessness of the Realignment

I compared the original model against its realigned counterpart on two benchmarks:

BenchmarkQuestionsOptionsOriginalRealigned
MMLU (12 subjects)240447.50%47.50%
MMLU Pro2001015.50%15.50%

On MMLU, both models answered exactly the same 114 out of 240 questions correctly — not just the same score, but the same individual answers. On MMLU Pro, the trajectory is identical (16.00% at 100 questions, 15.50% at 200 questions). The realignment is demonstrably lossless.

3.2 Cross-Layer Alignment

To extend the results, I computed the pairwise cosine similarity between the U[:, 0] vectors of the feed-forward down-projection matrix across all 528 layer pairs:

ModelMean AlignmentMax Alignment
Qwen 2.5 0.5B0.9440.991
Qwen 2.5 1.5B0.8010.953
Llama 3.2 3B0.6610.851
SmolLM2 1.7B0.8080.924
Nemotron-Mini 4B0.4000.923
ViT-B/160.2300.967

The intensity of the alignment varies by model family but is universally present. The phenomenon is not architecture-specific.

3.3 Measuring Homeostasis

I measured the persistence of an axis-specific gain across the layers by injecting a 5× amplification at layer 0 and measuring the per-axis magnitude at each subsequent layer. The amplification decays from 5.0× at layer 0 to 1.4× at layer 1 to 1.0× at layer 2, with no detectable trace beyond layer 2. The same experiment repeated at layers 5, 12 and 21 produces identical decay curves, shifted relative to the injection layer. The two-layer decay constant is invariant to the injection position.

3.4 Methods Protected by Patent

Some methods described in this paper for exploiting the observed phenomena — including specific procedures for the strategic modification of weights and the redistribution of energy at the output level — are the subject of pending patent applications. Their description here is limited to what is necessary to establish the existence of the phenomena, without enabling their reproduction.

4. Related Work

Mechanistic interpretability (Bricken et al. 2023; Templeton et al. 2024; Elhage et al. 2021) identifies individual features and circuits using sparse autoencoders. These approaches analyze what activates for which inputs. My work analyzes the global geometric structure that underlies all activations.

Activation engineering (Turner et al. 2023; Zou et al. 2023) adds control vectors to model activations to influence behaviour. My discovery of homeostasis (Section 2.3) suggests that these vectors are partially erased within two layers, which may explain their variable effectiveness and the observation that control vectors must be applied simultaneously to multiple layers.

Weight matrix decomposition (Hsu et al. 2022) uses SVD for model compression. I confirm that direct SVD compression is limited by the relatively flat spectrum of the feed-forward layers (condition number ≈ 11, k_{90}/d ≈ 0.76). However, I show that the model's information is in the geometry (U subspace), not in the spectrum (S vector), opening a different path to compression.

Model merging and rebasin (Ainsworth et al. 2023; Singh & Jaggi 2020) align models using permutation symmetries. My realignment uses continuous rotation, not discrete permutation, and exploits the structure within a single model rather than between models.

Representation geometry (Park et al. 2024; Marks & Tegmark 2023) studies the structure of hidden states. My canonical basis provides a coordinate system in which to conduct such studies.

5. Limitations

This work characterizes a single model (0.5B parameters) in depth. The cross-layer alignment and the respiration phenomenon have been observed in larger models of the same family (1.5B - 4B), but the detailed internal structure — rich club membership, homeostasis decay curves, spectral collapse — has been fully mapped only on the 0.5B subject.

The realignment requires full-precision (float32) rotation matrices. Imprecision breaks orthogonality and introduces small errors that accumulate across the layers.

My measurements are based on the outputs of the feed-forward projections. The role of the attention mechanism in the respiration rhythm has not yet been fully characterized.

The interpretation of the bipolar oscillator as "respiration" is a functional metaphor. The precise computational role of each pole — what the positive-pole axes do compared to what the negative-pole axes do — still needs to be characterized through ablation studies.

6. Conclusion

I have shown that rotating a Transformer language model into a coordinate system aligned with its own weight matrices reveals a hidden internal structure. In this canonical basis, I observe:

1. A bipolar oscillator — 309 positive-pole axes and 292 negative-pole axes engaged in a rhythmic alternation across the layers, invariant to the input content.

2. A homeostatic mechanism — the residual stream erases any intermediate perturbation within two layers, enforced by the combined action of RMSNorm, attention statistics, and FFN operating ranges.

3. Spectral redundancy — the magnitudes of the singular values are almost identical across the layers; the model operates on geometry, not on intensity.

4. Correlation-based knowledge — facts are encoded in patterns of co-activation between axes, not in individual axis values. The entire knowledge structure has an effective rank of 11.

5. Canonical-basis realignment — a lossless transformation that makes all of the above visible.

These results suggest a path from treating language models as opaque functions towards understanding them as structured dynamical systems with observable and measurable internal states. The black box has a geometry. You just need to look from the right angle.

Acknowledgments

I thank the open-source community for the models, datasets and tools used in this research.

References

1. Bricken, T. et al. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Anthropic.
2. Templeton, A. et al. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Anthropic.
3. Turner, A. M. et al. (2023). Activation Addition: Steering Language Models Without Optimization. arXiv:2308.10248.
4. Zou, A. et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405.
5. Hsu, Y.-C. et al. (2022). Extreme Compression of Large Language Models via Low-Rank Approximation. NeurIPS.
6. Ainsworth, S. K. et al. (2023). Git Re-Basin: Merging Models modulo Permutation Symmetries. ICLR.
7. Singh, S. P. & Jaggi, M. (2020). Model Fusion via Optimal Transport. NeurIPS.
8. Elhage, N. et al. (2021). A Mathematical Framework for Transformer Circuits. Anthropic.
9. Park, K. et al. (2024). The Geometry of Categorical and Hierarchical Concepts in Large Language Models. arXiv.
10. Marks, S. & Tegmark, M. (2023). The Geometry of Truth: Emergent Linear Structure in LLM Representation. arXiv.
11. Meng, K. et al. (2022). Locating and Editing Factual Associations in GPT. NeurIPS.

Patent Notice: Methods that exploit the canonical basis for model manipulation are the subject of pending patent applications. Commercial use requires a separate license.

Want to see the geometry in your own models?

Request technical brief Request engine access Zenodo
Blockchain Timestamp: OpenTimestamps · Bitcoin · 2026-05-31
ba622b9331b1ef5be6a78d78f0b8dc224fcf5d7bc12e7db22f87b1ab2875a865