Introducing NARE-1: Latent Fission Architecture

Today we're excited to announce NARE-1, our first production language model built on a novel architecture we call Latent Fission.

The Problem

Traditional language models process every token through the entire network, even when the task is simple. This wastes compute on easy predictions and underutilizes capacity on hard ones.

Our Solution: Latent Fission

NARE-1 dynamically routes computation based on semantic complexity:

Early Exit: Simple tokens (like punctuation, common words) exit after 12 layers instead of 28
Expert Routing: Complex reasoning activates specialized LoRA experts (code, math, logic)
Single Signal: All decisions driven by one metric—entropy of the output distribution

Architecture Highlights

Base Model: Qwen2.5-1.5B (28 layers)
Router Layer: 12 (middle of network)
Experts: 3 specialized LoRA adapters (r=16)
Early Exit Threshold: entropy < 2.5
Top-K Experts: 2 active simultaneously

Performance

Compared to the base Qwen2.5-1.5B model:

2.8x faster on average workloads (early exit rate: 45%)
Same quality on standard benchmarks (GSM8K, HumanEval, MMLU)
Better on specialized tasks when experts activate (code: +12%, math: +8%)

How It Works

Entropy Measurement: After layer 12, we compute Shannon entropy of the logits
Early Exit Decision: If entropy < 2.5, we skip remaining layers and output the token
Expert Activation: If entropy is high, we activate top-2 experts based on softmax gating
In-Place Fusion: Expert deltas are added to FFN outputs without copying activations

Good luck!