NEXUS AETHERIAL
REFERENCE DESK · EST. 2026
LIVE FEED ↻ ROLLING · 2026.05.24 · 06:42:18 UTC ENTRIES TRACKED · 142

One reference desk.
All cognitive systems.

An analytical encyclopedia for frontier model architectures. Compared, scored, and digested in real time — so you can see the field move before the press release lands.

# RELEASE FAMILY STATUS TIME
§ 02 · COGNITIVE COMPARE MATRIX

Plot two systems. Read the spread.

Six axes, normalized. Value-Ratio dial recomputes reasoning-per-dollar on every swap. Click a slot to change the model.

MODEL · A
GPT-5
OPENAI 400K CTX STABLE
MODEL · B
Claude Opus 4.5
ANTHROPIC 500K CTX STABLE

Six-axis profile

Normalized 0–100 across all tracked systems.

GPT-5 Claude Opus 4.5
SHAPE OF INTELLIGENCE — tap to include / exclude
VALUE RATIO · REASONING ÷ COST
A LEADS · +18%
GPT-5 leads on reasoning-per-dollar.
A · $/1M
$10.00
B · $/1M
$15.00
REASON Δ
+2
§ 03 · ANATOMY OF A MODEL

What you're actually looking at.

Scroll the column on the left. The diagram on the right keeps pace, lighting up the part of the system the text is describing. Five stations, end to end.

01 · TOKENIZE

Words become numbers.

Before anything can be reasoned about, your input is shattered into tokens — sub-word units drawn from a vocabulary of around 100k–200k pieces. "unforgettable" typically becomes un · forget · table.

Each token is looked up in an embedding table — a wide vector of learned numbers. From the model's point of view, your prompt now lives in a high-dimensional space where related ideas are geometrically close.

VOCAB · 100k–200k DIM · 4096–18432
02 · ATTEND

Every token reads every other.

The transformer's defining trick: self-attention. Each token, in parallel, asks "which other tokens should I be paying attention to?" — and answers itself. Multiple attention heads ask this same question along different axes.

One head might track syntax, another follows entities across paragraphs, a third matches code brackets. The model isn't taught which head does what — that emerges from training.

HEADS · 32–128 CTX · up to 2M tokens
03 · TRANSFORM

Many layers, narrow signals.

An attention layer is paired with a feed-forward network — a dense MLP that reshapes the signal token-by-token. Stack 40–120 of these blocks and you have a transformer.

In mixture-of-experts models the feed-forward layer is replaced by many parallel experts; a router picks ~2 per token. That's how 671B-parameter DeepSeek runs with only 37B active per forward pass.

LAYERS · 40–120 EXPERTS · 8–256 (MoE)
04 · TRAIN

Three teachers in sequence.

Pretraining — trillions of tokens of text and code. The model only ever learns "predict the next token", but at that scale, learning to predict becomes learning to compress, and compression starts to look like understanding.

Post-training — supervised fine-tuning on curated demonstrations, then preference optimisation (RLHF, DPO). This is where the helpful assistant persona appears.

RLVR — reinforcement learning from verifiable rewards. Used heavily for reasoning models: the model gets rewarded when its chain-of-thought leads to a correct, checkable answer.

PRETRAIN · 15T+ tokens POST · SFT → DPO → RLVR
05 · HARNESS

The model is the engine. The harness is the car.

What you call "GPT-5" or "Claude" in product is always a harness wrapped around model weights: a system prompt that defines its persona; a context window holding the conversation; a tool sheet listing functions it can call; a memory store; a safety filter on both sides.

Two products on the same weights can feel like different models. Two harnesses on the same task — agentic versus single-turn — can score 20 points apart on the same benchmark.

SYSTEM PROMPT TOOLS MEMORY EVAL HARNESS
§ 04 · REFERENCE CABINET

Ten dossiers. One drawer each.

Tap a row to pull the drawer. Each dossier carries architecture, modality, and a live benchmark profile.

§ 05 · BENCHMARK DIGEST

The leaderboard, but readable.

Pick a suite from the rail. Sort any column. Δ 30d shows movement since last month.

5 MODELS · MMLU-PRO · UPDATED 06:21 UTC